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TITLE 

GENES INVOLVED IN ISOPRENOID COMPOUND PRODUCTION 
This application claims priority to a provisional application No. 
607285.910 filed April 24, 2001 . 
5 FIELD OF THE INVENTION 

This invention is in the field of microbiology. More spedfically, this 
invention pertains to nucleic acid fragments encoding enzymes useful for 
microbial production of isoprenoid compounds. 

BACKGROUND OF THE INVENTION 

10 Isoprenoids are an extremely large and diverse group of natural 

products that have a common biosynthetic or^in, a single metabolic 
precursor, isopentenyl diphosphate (IPP). Isoprenoids includes all 
substances that are derived biosyntheticaliy firom the 5-carbon compound 
IPP (Spurgeon and Porter, Biosynthesis of Isoprenoid Compounds, 

IS pp 3^6, A Wiley-lnterscience Publication (1981)). Some isoprenoids are 
also referred to as "terpenes" or "terpenotds^ Isoprenoids are ubiquitous 
compounds found in all living organisms. Some of the well-known 
examples of isoprenoids are steroids (triterpenes), carotenoids 
(tetraterpenes), and squalene just to name a few. 

20 For many years, it was accepted that IPP was synthesized through 

the well-known acetate/mevalonate pathway. However, recent studies 
have demonstrated that this mevalonate-dependent pathway does not 
operate in all living organisms. An altemate mevalonate-independent for 
IPP biosynthesis was initially characterized in bacteria and later in green 

25 algae and higher plant (Hort^ach et aL, FEMS Microbiol. Lett. 111:1 35-140 
(1993): Rohmer ef a/., Biochem. 295: 517-524 (1993); Schwender ef a/., 
Biochem. 316: 73-80 (1996); Eisenreich etal., Proc. Natl. Acad. Sci. USA 
93:6431-6436(1996)). 

Many steps in the mevalonate-independent isoprenoid pathway are 

30 known. For example, the initial steps involve the pyruvate and 
D-glyceraldehyde 3-Phosphate, to yield 5-carbon compound, 
D-1-deoxyxylulose-5-phosphate. A gene, dxs, that encodes D-1- 
deoxyxyiulose-5-phosphate synthase (DXS) that catalyzes the synth^ts of 
D-1-deoxyxylulose-5-phosphate was reported in Mycobacterium 

35 tuberculosis (Cole et aL. Nature. 393:537-544, 1998). 

Next, the isomerization and reduction of D-1-deoxyxylulose-5- 
phosphate yields 2-Omethyl-D-erythritol^phosphate. One of the 
enzymes involved in the isomerization and reductk^n process is D-1- 
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deoxyxyluiose-5-phosphate reductoisomerase (DXR). The gene product 
of dbrrthat catalyzes the formation of 2-C-methyl-D-erythritoM-phosphate 
- has been reported in Mycobacterium tuberculosis (Cole et al., supra). 
Steps converting 2-C-methyl-D-erythritol-4-phosphate to 

5 isopentenyl monophosphate are not well characterized although some 
steps are known. 2-C-methyl-D-erythritol-4-phosphate is converted into 4- 
diphosphocytidyl-2C-methyl-D-erythritol in a CTP dependent reaction by 
the enzyme encoded by the non-annotated gene ygbP. It has been 
reported that the YgbP protein is present in Mycobacterium tuberculosis, 

10 catelyzing the reaction mentioned above (Cole et al.. Supra). Recently. 
ygbP gene was renamed as ispD as a part of isp gene cluster 
(SwissPn)t#Q46893) (Cote et al.. Supra). 

The 2nd position hydroxy group of 4-diphosphocytidyl-2C-methyl- 
D-eryffiritol can be phosphorylated in an ATP dependent reaction by the 

15 enzyme encoded by ychB gene. The ychB gene product phosphorylates 
4-diphosphocytidyl-2C-methyi-D-erythrftol resulting in 4-diphosphocytidyl- 
2C-methyl-D-erythritol 2-i3hosphate. Cole et al. {Supra) have reported a 
YchB protein in Mycobacterium bJberculo^s. Recently, ychB gene was 
renamed as ispE as a part of isp gene cluster (SwissPro»P24209) (Cole 

20 etal.. St/p/a). 

The product of the ygbB gene converte 4<liphosphocytidyl-2C- 
methyl-D-erythritol 2H3hosphate to 2C-rnethyl-D-erythritol 2,4- 
cydodiphosphate. Cote et al. {Supra) reported that ygbB gene product in 
Mycobacterium tuberculosis {Nature. 393:537-544. 1998). 2C-methyl-D- 

25 erythritol 2,4^ciodiphosphate can be further converted into carotenoids 
through the carotenoid biosynthesis pathway. Recently. ygbB gene was 
renamed as ispF as a part of isp gene cluster (SwissProWP36663). The 
reaction catelyzed by YgbP enzyme is carried out in CTP dependent 
manner. Isopentenyl monophosphate and isopentenyl diphosphate (IPP) 

30 are fomied. through a series of reactions not yet characterized but have 
recently been proposed to be mediated by LytB and GcpE (Cunningham 
etal., J. Bacterial., 182:5841-5848. 2000; McAleeref a/.. J. Bacterial., 
183:7403-7407. 2000). 

In £ CO//. iPP can be converted to dimethylallyl diphosphate 

35 (DMAPP) by an isomerization reaction catelayzed by the kS gene which is 
dispensibte, suggesting that DMAPP and IPP are produced independently 
(McAteer ef a/., J. Bacterid., 183:7403-7407. 2000). There is a broad 
. group of enzymes catalyzing the consecutive condensation of isopentenyl 

2 
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diphosphate (IPP) resulting in the formation of prenyl diphosphates of 
various chain lengths. Homologous genes of prenyl transferase have 
highly conserved regions in their amino acid sequences. They are 
heptaprenyl synthase, geranylgeranyl (C20) diphosphate synthase (Cole 

5 et al.. Supra), famesyl (C15) diphosphate synthase which can catalyze the 
synthesis of five prenyl diphosphates of various lengths. 

Formation of C40 phytoene is canied out by crtB gene tiiat encodes 
phytoene synthase. Phytoene is fonned by condensation of two 
molecules of C20 precursor geranylgeranyl pyrophosphate (GGPP). 

10 Phytoene synttiase has been isolated from Streptomyces coelicolor 
(GenBank#T36969). 

Further down in the isoprenoid biosynthesis pathway, more genes 
are involved in synthesis of carotenoid. Pytoene desaturation step is 
canied out by crU gene resulting in the fbnnation of lycopene. A gene 

IS encoding phytoene dehydrogenase gene, crti, has been isolated form 
Streptomyces coe//co/or (GenBani^T36968). 

Lycopene cyclization is canied out by crtY/L gene product, 
lycopene cyclase. Lycopene cyclase has been isolated from Deinococcus 
radiodurans (Whrte et al. Science, 286: 1 571 -1 577 (1 999)). 

20 Although many genes needed for isoprenoid and carotenoid 

synttiesis synthesis have been characterized, the genes involved in the 
isoprenoid and/or carotenoid pathways in Rhodococcus bacteria are not 
described in the existing literature. There are many pigmented 
Rtiodococcus bacteria which suggests that the ability to produce 

25 carotenoid pigments is widespread in these bacteria. 

The problem to be solved therefore is to isolate the sequences 
responsible for isoprenoid biosynthesis in Rhodococcus for their eventual 
use in isoprenoid and carotenoid production. Applicants have solved the 
stated problem by isolating a nucleic acid fragment from a Rtiodococcus 

30 efythropolis AN12 strain containing 10 open reading frames (ORFs) 
encoding enzymes involved in isoprenoid synthesis. 

SUMMARY OF THE INVENTION 
Ten open reading frames, each encoding enzymes in the 
isoprenoid biosynthetic pathway have been identified and isolated from 

35 Rhodococcus ery^iopolis AN12- The present enzymes are useftjl for the 
production of isoprenoids in recombinant organisms. These compounds 
are difficult and expensive to produce chemically and have potent 
antioxidant properties that are beneficial to human and animal health. 
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Rhodococcus strains are good production hosts and are particularly suited 
to production of carotenoids due to inherent capacity to produce these 
compounds found in many species of the genus. 

The present invention provides an Isolated nucleic acid molecule 
5 selected from ttie group consisting of: 

(a) an isolated nucleic acid molecule encoding an isoprenoid 
biosynttietic enzyme having an amino acid sequence 
selected from the group consisting of SEQ ID N0s:2, 4, 6, 
8. 10, 12, 14. 16. 18 and 20; 
10 (b) an Isolated nucleic acid molecule encoding a isoprenoid 

biosynttietic enzyme that hybridizes with (a) under tiie 
following hybridization conditions: 0.1X SSC, 0.1% SDS. 
65X and washed w'rth 2X SSC. 0.1% SDS followed by 
0.1XSSC.0.1%SDS;or 
15 an isolated nucleic add molecule ttiat is complementary to (a). 

or(b). 

Additionally the invention provides chimeric genes comprising the 
instant nucleic acid fragments operably linked to appropriate regulatory 
sequences and polypeptides encoded by the present nucleic acid 
20 fragments and chimeric genes. 

The invention additionally provides transfomned hosts comprising 
the instant nucleic acid sequences wherein tiie host cells are selected 
from the group consisting of bacteria, yeast, filamentous fungi, algae, and 
green plants. 

25 In another embodiment the invention provides a method of 

obtaining a nucleic add molecule encoding an isoprenoid compound 
biosynthetic enzyme comprising: 

(a) probing a genomic library witii the nucleic acid molecule of 
any one of the present isolated nucleic acid sequences; 
30 (b) _ .identifying a DMA clone tiiat hybridizes with tiie nudeic add 

molecule of any one of the present nucleic add sequences; and 
(c) sequendng the genomic fragment that comprises the done 
identified in step (b), 

wherein the sequenced genomic fragment encodes an 
35 isoprenoid biosynttietic enzyme. 

Similariy the invention provides a method of obtaining a nudeic add 
molecule encoding an teoprenoid biosynthetic enzyme comprising: 
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(a) synthesizing an at least one oligonucleotide primer 
corresponding to a portion of the sequence selected from the group 
consisting of SEQ ID N0s:1, 3, 5. 7, 9, 11. 13, 15. 17 and 19; and 

(b) amplifying an insert present in a cloning vector using the 
S oligonucleotide primer of step (a); 

wherein the amplified insert encodes a portion of an amino add sequence 
encoding an isoprenoid biosynthetic enzyme. 

In another embodiment the invention provides a method for the 
production of isoprenoid compounds comprising: contacting a transformed 
10 host cell under suitable growth conditions with an effective amount of a 
fennentable cari3on substrate whereby an isoprenoid compound is 
produced, said transfomned host cell comprising a set of nucleic acid 
molecules encoding SEQ ID N0s:2. 4, 6, 8, 10. 12, 14. 16. 18 and 20 
under the control of suitable regulatory sequences. 
IS In an altemate embodiment the invention provides a method of 

regulating isoprenoid biosynthesis in an organism comprising, over- 
expressing at least one isoprenoid gene selected from the group 
consisting of SEQ ID N0s:1, 3, 5. 7. 9, 11, 13, 15, 17 and 19 In an 
organism such that the isoprenoid biosynthesis is altered in the organism. 
20 The regulation of isoprenoid biosynthesis may be accomplished by means 
of expressing genes on a multicopy plasmid, operably linking the relevant 
genes to regulated or inducible promoters, by antisense expression or by 
selective disruption of certain genes in the pathway. 

Additionally a mutated gene is provided encoding a isoprenoid 
25 enzyme having an altered biological activity produced by a method 
comprising the steps o^ 

(I) digesting a mixture of nucleotide sequences with restriction 
endonucleases wherein said mixture comprises: 

a) a native isoprenoid gene of the invention; 
"^30 ... .b) afirstpopulationof nucleotide fragments which will 

hybridize to said native isoprenoid gene of the invention; 

c) a second population of nucleotide fragments which 
will not hybridize to said native isoprenoid gene of the invention; 
wherein a mixture of restriction fragments is produced; 
35 (ii) denaturing said mixture of restriction fragments; 

(iii) incubating the denatured said mixture of restriction 
fragments of step (ii) with a polymerase; 
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(iv) repeating steps (ii) and (iii) wherein a mutated isoprenoid 
gene is produced encoding a protein having an altered biological activity. 
BRIEF DESCRIPTION OF THE DRAWINGS 
AND SEQUENCE DESCRIPTIONS 
5 Figure 1 shows the isoprenoid pathway and the putative function of 

the isoprenoid genes identified in AN12. 

Figure 2 shows HPLC analysis of carotenoid pigments from 
Rhodococcus erythmpolis AN12 strain and ATCC 47072. 

Figure 3 shows the targeted gene disruption by homologous 
10 reconribination using the crti gene as an example. 

The invention can be more fiilly understood from the following 
detailed description and the accompanying sequence descriptions, which 
fbmi a part of this application. 

The following sequences comply with 37 C.F.R. 1.821-1.825 
IS ("Requirements for Patent Applications Containing Nucleotide Sequences 
and/or Amino Acid Sequence Disclosures - the Sequence Rules") and are 
consistent with Worid Intellectual Property Organization (WlPO) Standard 
ST.25 (1998) and the sequence listing requirements of the EPO and PCT 
(Rules 5.2 and 49.5(a'bis), and Section 208 and Annex C of the 
20 Administrative Instructions). The symbols and fomiat used for nucleotide 
and amino acid sequence data comply with the rules set forth in 
37C.F.R. §1.822. 

SEQ ID NO:1 is the nucleotide sequence of ORF 1 encoding dxs 

gene. 

25 SEQ ID N0:2 is the deduced amino add sequence of dxs encoded 

by ORF 1. 

SEQ ID N0:3 Is the nucleotide sequence of ORF 2 encoding dxr 

gene. 

SEQ ID NO:4 is the deduced amino acid sequence of dbcr encoded 
-^--^ 30 by ORF 2. . _ 

SEQ ID N0:5 is the nucleotide sequence of ORF 3 encoding ygbP 
(/spD)gene. 

SEQ ID N0:6 is the deduced amino add sequence of ygbP 
(/spD)gene encoded by ORF 3. 
35 SEQ ID NO:7 is the nucleotide sequence of ORF 4 encoding ychB 

(/spE) gene. 

SEQ ID NO:8 is the deduced amino add sequence of ychB (/spE) 
encoded by ORF 4. 

6 
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SEQ ID N0:9 is the nucleotide sequence of ORF 5 encoding ygbB 
{ispF) gene. 

SEQ ID NO:10 is the deduced amino add sequence of ygbB 
(/sp/=)encoded by ORF 5. 
S SEQ ID NO:1 1 is the nucleotide sequence of ORF 6 encoding ispA 

gene. 

SEQ ID NO:12 is the deduced amino acid sequence of ispA gene 
encoded by ORF 6. 

SEQ ID NO:13 is the nucleotide sequence of ORF 7 encoding crtE 

10 gene. 

SEQ ID N0:14 is the deduced amino acid sequence of atE gene 
encoded by ORF 7. 

SEQ ID NO:15 is the nucleotide sequence of ORF 8 encoding CftB 

gene. 

IS SEQ ID N0:16 is the deduced amino add sequence of crtB gene 

encoded by 0RF8 . 

SEQ ID NO:17 is the nudeotide sequence of ORF 9 encoding erf/ 

gene. 

SEQ ID N0:18 is the deduced amino acid sequence of crti gene 
20 encoded by ORF 9. 

SEQ ID N0:19 is the nudeotide sequence of ORF 1 0 encoding crtL 

gene. 

SEQ ID NO:20 is the deduced amino add sequence of crtL gene 
encoded by ORF 10. 
25 SEQ ID N0s:21-^ are the primer sequences. 

ngTAILED DESCRIPTIO N OF THE INVENTION 
The present genes and their expression products are useful for the 
creation of recombinant organisms that have the ability to produce various 
isoprenoid compounds induding carotenoid compounds. Nudeic add 
30 fragments.enoKiing the above mentioned enzymes have been isolated 
from a strain of Rhodococcus erythmpolis and identified by comparison to 
public databases containing nudeotide and protein sequences using the 
BUKST and FASTA algorithms well known to those skilled in the art 

The genes and gene products of the present Invention may be used 
3S in a variety of ways for the enhancement or manipulation of isoprenoki 
compounds. 

The nucrobial isoprenoid pathway is naturally a multi-product 
platfomn for production of compounds such as carotenoids, quinones. 

7 
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squalene, and vitamins. These natural products may be from 5 cart}on 
units to more than 55 cart)on units in chain length. There is a general 
practical utility for microbial Isoprenoid production for carotenoid 
compounds as these compounds are very difficult to make chemically 

5 (Nelis and Leenheer, AppL BacterioL 70:181-191 (1991)). Most 

carotenoids have strong color and can be viewed as natural pigments or 
colorants. Furthemiore, many carotenoids have potent antioxidant 
properties and thus inclusion of these compounds in the diet is thought to 
healthful. Well-known examples are p-carotene and astaxanthin. 

10 In the case of Rhodococcus erythmpolis the inherent capacity to 

produce carotenoids is particularly useful. Because Rhodococcus cells 
are resistant to many solvents and amenable to mixed phase process 
development, it is advantageous to use Rhodococcus strain as a 
production platfonn. Rhodococcus strains have been successfully used as 

IS a production hosts for the commerdal production of other chemicals such 
as acrylamkle. 

The genes and gene sequences descrit>ed herein enable one to 
incorporate the production of healthful carotenoids directly into the single 
cell protein product derived from Rhodococcus erythropolis. This aspect 
20 makes this strain or any bacterial strain into which tiiese genes are 

incorporated a more desirable production host for animal feed due to the 
presence of carotenokls which are known to add desirable pigmentation 
and heaKh benefits to ttie feed. Salmon and shrimp aquacultures are 
particulariy useful applications for this invention as carotenoM 
25 pigmentation is critically important for ttie value of tfiese organisms. (F. 
Shahidi, J A Brown, CarotenoM pigments in seafood and aquaculture 
Critical reviews In food Science 38(1): 1-67 (1998)) 

In addition to food supplements and feed additives the genes are 
useful for the production of carotenoids, and their derivatives, isoprenoid 
•^^ '^30 intemiediates^nd ttieir derivatives as pure products useful as pigments, 
steroids, flavors and fragrances and compounds with potential elecfro- 
optic applications. 

In this disclosure, a number of temns and abbreviations are used. 
The following definitions are provided. 
35 "Open reading frame* is abbreviated ORF. 

"Polymerase chain reaction" is abbreviated PGR. 
As used her^n, an Isolated nucleic acid fragmenf is a polymer of 
RNA or DNA fliat is single- or double-stranded, optionally containing 

8 
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synthetic, non-natural or altered nucleotide bases. An isolated nucleic 
add fragment in the form of a polymer of DMA may be comprised of one or 
more segments of cDNA. genomic DMA or synthetic DMA. 

The tenri Isoprenold" or "terpenoid" refers to the compounds are 
S any molecule derived firom the isoprenoid pathway including 1 0 carbon 
terpenoids and their derivatives, such as carotenoids and xanthophylls. 

The term Rhodococcus erythmpolis AN12 or AN12 refers to the 
Rhodococcus erythmpolis AN12 strain and used interchangeably. 

The temn Rhodococcus erythmpolis ATCC 47072 or ATCC 47072 
10 refers to the Rhodococcus erythmpolis ATCC 47072 strain and used 
interchangeably. 

The temi "Dxs" refers to 1-deoxyxylulose-5-phosphate synthase 
enzyme encoded by dxs gene represented in ORF 1. 

The temn "Dxr" refers to l-deoxyxylulose-S-phosphate 
15 reductoisomerase enzyme encoded by dxr gene represented in ORF 2. 

The term "YgbP" or "IspD" refers to 4-diphosphocytidyl-2C-methyl- 
D-erythritol synthase enzyme encoded by ygbP or ispD gene represented 
in ORF 3. Tbe names of the gene, ygbP or ispD, are used 
interchangeably in this application. The names of gene product, YgbP or 
20 IspD are used interchangeably in this application. 

The temi "YchB" or "IspE' refers to isopentenyl monophosphate 
kinase enzyme encoded by ychB or ispE gene represented in ORF 4. The 
names of the gene. ychB or fepE. are used interchangeably In this 
application. The names of gene product. YchB or IspE are used 
25 interchangeablyjn this application. 

The term "YgbB" or 'IspP refers to 2C-methyl-D-erythritol 2. 
4-cyclodiphosphate synthase enzyme encoded by ygbS or ispF gene 
represented in ORF 5. The names of the gene, ygbB or ispF, are used 
interchangeably in this application. The names of gene product. YgbB or 
30 IspF are us.edjnterchangeably in this application. 

The temn "IspA" refers to geranyltransferase or hepteprenyl 
diphosphate synthase enzyme as one of prenyl transfarase femily 
encoded by ispA gene represented in ORF 6. 

The term 'CrtE* refers to geranylgeranyi pyrophosphate synthase 
35 enzyme encoded by crtE gene represented in ORF 7. 

The tent) "CrtB" refers to phytoene synthase enzyme encoded by 
crtB gene represented in ORF 8. 
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The term "CrtI" refere to phytoene dehydrogenase enzyme encoded 
by CrtI gene represented in ORF 9. 

The tenn "CrtL" refers to lycopene cyclase enzyme encoded by crtL 
gene represented in ORF 10. 

5 A nucleic acid molecule is "hybridizable" to another nucleic acid 

molecule, such as a cDNA, genomic DNA. or RNA, when a single 
stranded fomi of the nucleic acid molecule can anneal to the other nucleic 
acid molecule under the appropriate conditions of temperature and 
splution ionic strength. Hybridization and washing conditions are well 

10 l^own and exemplified in Sambrook, J., Fritsch, E. F. and Maniatis, T. 
Molecular Cloning: A Laboratory Manual. Second Edition, Cold Spring 
Harbor Laboratory Press, Cold Spring Harbor (1989). particularly 
Chapter 11 and Table 11.1 therein (entirely incorporated herein by 
reference). The conditions of temperature and ionic strength determine 

15 the "stringency" of the hybridization. Stringent^ conditions can be 

adjusted to screen for moderately similar fragments, such as homologous 
sequences from distantly related organisms, to highly similar fragments, 
such as genes that duplicate functional enzymes from closely related 
organisms. Post-hybridization washes detennine stringency conditions. 

20 One set of prefened conditions uses a series of washes starting with 6X 
SSC, 0.5% SDS at room temperature for 15 min, then repeated with 2X 
SSC, 0.5% SDS at 45'C for 30 mm, and then repeated twice wiUi 0.2X 
SSC, 0.5% SDS at 50'C for 30 min. A more prefoned set of stringent 
conditions uses higher temperatures in which the washes are identical to 

2S those above except for the temperatore of the final two 30 min washes in 
0.2X SSC. 0.5% SDS was increased to 60"C. Anottier preferred set of 
highly stringent conditions uses two final washes in 0.1X SSC, 0.1% SDS 
at 65X. Yet anotiier set of prefen^ hybridization conditions includes 
hybridization at 0.1X SSC. 0.1% SDS. 65X and washed with 2X SSC, 
—-^30 0.1% SDS fqltowed by 0.1X SSC. 0.1% SDS. 

Hybridization requires that the two nucleic acids contain 
complementary sequences, although depending on the stringency of tiie 
hybridization, mismatches between bases are possible. The appropriate 
stringency for hybridizing nucleic adds depends on the length of the 

35 nucleic adds and the degree of complementation, variables well known in 
the art The greater the degree of similarity or homology between two 
nucleotide sequences, the greater the value of Tm for hybrids of nucleic 
acids having tiiose sequences. The relative stability (connesponding to 
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higher T/n) of nucleic acid hybridizations decreases in the following order 
RNAiRNA, DNA:RNA. DNA:DNA. For hybrids of greater than 
100 nucleotides in length, equations for calculating Tm have been derived 
(see Sambrook et al., supra, 9.50-9.51). For hybridizations with shorter 

S nucleic acids, i.e., oligonucleotides, the position of mismatches becomes 
more important, and the length of the oligonucleotide detemiines Its 
specificity (see Sambrook et al., supra, 1 1 .7-1 1 .8). In one embodiment the 
length for a hybridizable nucleic acid is at least about 10 nucleotides. 
Preferable a minimum length for a hybridizable nucleic acid is at least 

10 about 15 nucleotides; more preferably at least about 20 nucleotides; and 
most preferably the length is at least 30 nucleotides. Furthemiore, the 
skilled artisan will recognize that the temperature arid wash solution salt 
concentration may be adjusted as necessary according to factors such as 
length of the probe. 

15 A "substantial portion" of an amino add or nucleotide sequence 

comprising enough of the amino acid sequence of a polypeptide or the 
nucleotide sequence of a gene to putativeiy identify that polypeptide or 
gene, either by manual evaluation of the sequence by one skilled in the 
art, or by computer-automated sequence comparison and identification 

20 using algorithms such as BLAST (Basic Local Alignment Search Tool; 
Altschul, S. F.. et al., (1993) J. Mol Biol. 215:403-410; see also 
www.ncbi.nlm.nih.gov/BLAST/). In general, a sequence of ten or more 
contiguous amino acids or thirty or more nucleotides is necessary in order 
to putativeiy identify a polypeptide or nucleic add sequence as 

25 homologous to a known protein or gene. Moreover, with respect to 
nucleotide sequences, gene spedfic oligonudeotide probes comprising 
20-30 contiguous nucleotides may be used in sequence-dependent 
mettiods of gene identification (e.g., Southern hybridization) and isolation 
(e.g., in situ hybridization of bacterial colonies or bacteriophage plaques). 

30 In addition, sJiprt oligonucleotides of 12-15 bases may be used as 
amplification primers in PGR in order to obtain a particular nucleic acid 
fragment comprising the primers. Accordingly, a "substantial portion" of a 
nudeotide sequence comprises enough of the sequence to specifically 
identify and/or isolate a nudeic add fragment comprising the sequence. 

35 The instant specification teaches partial or complete amino acid and 

nudeotide sequences encoding one or more particular mterobial proteins. 
The skilled artisan, having the benefit of the sequences as reported 
herein, may now use all or a substantial portion of the disclosed 
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sequences for purposes known to those skilled in this art Accordingly, the 
instant Invention comprises ttie complete sequences as reported in the 
accompanying Sequence Listing, as well as sut)stantial portions of those 
sequences as defined atiove. 

5 The temi "complementary" is used to describe the relationship 

between nucleotide bases that are capable to hybridizing to one another. 
For example, with respect to DNA, adenosine is complementary to 
thymine and cytosine is complementary to guanine. Accordingly, the 
instant invention also includes Isolated nucleic add fragments that are 

10 complementary to the complete sequences as reported in the 

accompanying Sequence Listing as well as those substantially similar 
nucleic acid sequences. 

The tenn 'percent Mentity", as known in the art. is a relationship 
between two or more polypeptkie sequences or two or more 

15 polynucleotide sequences, as detennined by comparing the sequences. 
In the art. "Mentity' also means the degree of sequence relatedness 
between polypeptkle or polynucleotkle sequences, as the case may be, as 
determined by the match between strings of such sequences. "Identity" 
and "similarity" can be readily calculated by known methods, including but 

20 not limited to those described in: Computational Molecula r Biology (Lesk, 
A. M.. ed.) Oxford University Press, NY (1988); Biocomputina: informatics 
and Genome Proiects (Smith, D. W„ ed.) Academic Press, NY (1993); 
Computer Analvsis of Sequence Data. Part I (Griffin. A. M.. and Griffin. H. 
G.. eds.) Humana Press. NJ (1994); Sequence Analvsis in Molecular 

25 Biology (von Heinje. G.. ed.) Academic Press (1987); and Sequgnpe 

Analvsis Primer (Gribskov. M. and Devereux. J., eds.) Stockton Press. NY 
(1991). Preferred methods to determine kientity are designed to give the 
best match between the sequences tested. Methods to detemnine kientity 
and similarity are codified in publicly available computer programs. 
^30 Sequence alignments and percent kientity calculations may be perfomned 
using the Megalign program of the LASERGENE bfoinfomnatics computing 
suite (DNASTAR Inc., Madison, Wl). Multiple alignment of the sequences 
was perfomied using the Clustel method of alignment (Higgins and Sharp 
(1989) CABIOS. 5:151-153) with the default parameters (GAP 

35 PENALTY=10. GAP LENGTH PENALTY=10). Default parameters for 
painwise alignments using the Clustel method were KTUPLE 1. GAP 
PENALTY=3. WINDOV\^=S and DIAGONALS SAVED-5. 
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Suitable nucleic add fragments (Isolated polynucleotides of the 
present invention) encode polypeptides that are at least about 70% 
Identical, preferably at least about 80% identical to the amino acid 
sequences reported herein. Preferred nucleic acid fragments encode 

S amino acid sequences that are about 85% identical to ttie amino acid 
sequences reported herein. More prefenred nucleic acid fragments 
encode amino acid sequences that are at least about 90% identical to ttie 
amino add sequences reported herein. Most preferred are nudeic acid 
fragments that encode amino acid sequences ttiat are at least about 95% 

10 identical to ttie amino add sequences reported herein. Suitable nucleic 
add fragments not only have the above homologies but typically encode a 
polypeptide having at least 50 amino adds, preferably at least 100 amino 
adds, more preferably at least 150 amino adds, still more prefierably at 
least 200 amino adds, and most preferably at least 250 amino adds. 

IS "Codon degeneracy" refers to the nature in tifie genetic code 

pemDitHng variation of the nudeotide sequence without effecting the amino 
acid sequence of an encoded polypeptide. Accordingly, ttie instant 
invention relates to any nudeic add fragment ttiat encodes all or a 
substantial portion of ttie amino acid sequence encoding the instant 

20 microbial polypeptides as set fortti in SEQ ID Nos. The skilled artisan is 
well aware of the "codon-bias'' exhibited by a specific host cell in usage of 
nudeotide codons to specify a given amino acid. Therefore, when 
synthesizing a gene for improved expression in a host cell, it is desirable 
to design the gene such ttiat its frequency of codon usage approaches the 

25 firequency of preferred codon usage of the host cell. 

"Syntiietic genes' can be assembled from oltgonudeotide building 
blocks that are chemically synthesized using procedures known to ttiose 
skilled in the art. These building blocks are ligated and annealed to fomi 
gene segments which are then enzymatically assembled to construct ttie 
-^30 entire gene. JlChemically synttiesized", as related to a sequence of DNA, 
means ttiat the component nudeotides were assembled in vitro. Manual 
chemical synthesis of DNA may be accomplished using well-established 
procedures, or automated chemk:al synthesis can be performed using one 
of a number of commercially available machines. Accordingly, flie genes 

35 can be tailored for optimal gene expression based on optimization of 
nucleotide sequence to reflect the codon bias of the host cell. The skilled 
artisan appredates the likelihood of successful gene expression if codon 
usage is biased towards those codons fevered by the host. Determination 
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of preferred codons can be based on a survey of genes derived from the 
host cell where sequence infonnation is available. 

"Gene' refers to a nudeic add fragment that expresses a specific 
protein, induding regulatory sequences preceding (5* non-coding 

S sequences) and following (3' non-coding sequences) the coding 

sequence. "Native gene" refers to a gene as found in nature with its own 
regulatory sequences. "Chimeric gene" refers to any gene that is not a 
native gene, comprising regulatory and coding sequences that are not 
found together in nature. Accordingly, a chimeric gene may comprise 

10 regulatory sequences and coding sequences that are derived from 

different sources, or regulatory sequences and coding sequences derived 
from the same source, but an^nged in a manner different than that found 
in nature. "Endogenous gene' refers to a native gene in its natural 
location in the genome of an organism. A "foreign" gene refers to a gene 

15 not nomtalty found in the host organism, but that is introduced into the 
host organism by gene transfer. Foreign genes can comprise native 
genes inserted into a non-native organism, or chimeric genes. A 
"transgene" is a gene ttiat has been introduced Into ttie genome by a 
transfomiation procedure. 

20 "Coding sequence" refers to a DNA sequence that codes for a 

specific amino add sequence. "Suitable regulatory sequences" refer to 
nudeotide sequences located upstream (5' non-coding sequences). wiUiin. 
or downstream (3* non-coding sequences) of a coding sequence, and 
which influence the transcription, RNA processing or stability, or 

25 translation of the assodated coding sequence. Regulatory sequences 
may indude promoters, translation leader sequences, introns, 
polyadenylation recognition sequences, RNA processing site, effector 
binding site and stem-loop structure. 

"Promoter" refers to a DNA sequence capable of controlling the 
• ' 30 expression of a coding sequence or functional RNA. In general, a coding 
sequence is located 3' to a promoter sequence. Promoters may be 
derived in ttieir entirety from a native gene, or be composed of different 
elements derived from different promoters found in nature, or even 
comprise syntiietic DNA segments. It is understood by ttiose skilled in ttie 

35 art tiiat different promoters may dired ttie expression of a gene in diflierent 
tissues or cell types, m at different stages of development, or In response 
to different environmental or physiological conditions. Promoters which 
cause a gene to be expressed in most ceil types at most times are 
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commonly referred to as "constitutive promotere". It Is further recognized 
that since in most cases the exact boundaries of regulatory sequences 
have not been completely defined. DNA fragments of different lengths may 
have identical promoter activity. 

5 The "3' non-coding sequences" refer to DNA sequences located 

downstream of a coding sequence and include polyadenylation recognition 
sequences and other sequences encoding regulatory signals capable of 
affecting mRNA processing or gene expression. The polyadenylatibn 
signal is usually characterized by affecting the addition of polyadenylic 

10 acid tracts to the 3' end of the mRNA precursor 

"RNA transcripf refers to the product resulting from RNA 
polymerase-catalyzed transcription of a DNA sequence. When ttie RNA 
transcript Is a perfect complementary copy of the DNA sequence. It Is 
referred to as the primary transcrqjt or it may be a RNA sequence derived 

IS from post-transcriptionaj processing of the primary transcript and is 
referred to as the mature RNA. "Messenger RNA (mRNA)" refers to the 
RNA that is without Introns and that can be translated into protein by the 
cell. "cDNA" refers to a double-stranded DNA that is complementary to 
and derived from mRNA. "Sense" RNA refers to RNA transcript that 

20 includes the mRNA and so can be translated into protein by the cell. 

"Antisense RNA" refers to a RNA transcript that is compfementary to all or 
part of a target primary transcript or mRNA and that blocks the ^presslon 
of a target gene (U.S. Patent No. 5.107.065; WO 9928508). The 
complementarity of an antisense RNA may be with any part of the specific 

25 gene transcript, i.e.. at the 5' non-coding sequence. 3' non-coding 

sequence, or the coding sequence. "Functional RNA" refers to antisense 
RNA. riisozyme RNA. or other RNA that Is not translated yet has an effect 
on cellular processes. 

The tenn "operably linked' refers to the association of nucleic acid 
'30 sequences. on_a single nucleic acid fragment so that the function of one Is 
affected by the other. For example, a promoter is operably linked with a 
coding sequence when it Is capable of affecting the expression of that 
coding sequence Q.e., that the coding sequence Is under the 
transcriptional control of the promoter). Coding sequences can be 

35 operably Dnked to regulatory sequences in sense or antisense orientation. 
The tenn "expression", as used herein, refers to ttie transcription 
and steble accumulation of sense (mRNA) or antisense RNA derived from 
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the nucleic add fragment of the invention. Expression may also refer to 
translation of mRNA into a polypeptide. 

Transfbmriation" refers to the transfer of a nucleic add fragment 
into the genome of a host organism, resulting in genetically stable 
5 inheritance. Host organisms containing the transformed nudeic acid 
fragments are referred to as "transgenic" or "recombinant" or "transfomned" 
organisms. 

The term "femnentable carbon substrate" refers to a carbon source 
capable of l)eing metabolized by host organisms of the present invention 

10 and particularly carbon sources selected from the group consisting of 
monosaccharides, oligosaccharides, polysaccharides, and one-carbon 
substrates or mixtures thereof. 

The tenns "plasmid". "vector" and "cassette" refer to an extra 
diromosomal element often carrying genes whidi are not part of the 

IS central metebolism of the cell, and usually in the fomi of circular double- 
stranded DMA fragments. Such elements may be autonomously 
replicating sequences, genome integrating sequences, phage or 
nucleotide sequences, linear or drcular, of a single- or double-stranded 
DNA or RNA, derived from any source. In which a number of nucleotide 

20 sequences have been joined or recombined into a unique construction 
which is capable of introdudng a promoter fragment and DNA sequence 
for a selected gene product alorig with appropriate 3* untranslated 
sequence into a cell. "Transformation cassette" refers to a specific vector 
conteining a foreign gene and having elemente in addition to the foreign 

25 gene that fedlitatetransfomiation of a particular host cell. "Expression 
cassette" refers to a specific vector containing a foreign gene and having 
elements in addition to the foreign gene that allow for enhanced 
expression of that gene in a foreign host 

The term "altered biological activity" will refer to an activity, 
- - ^ 30 assodated. with a protein encoded by a microbial nudeotide sequence 
which can be measured by an assay method, where that activity is either 
greater than or less than the adivity assodated with the native microbial 
sequence. "Enhanced biological activity" refers to an altered activity that is 
greater than that assodated with the native sequence. "Diminished 

35 biological activity" is an altered activity that is less than that assodated 
with the native sequence. 

The term "sequence analysis software" refers to any computer 
algorithm or software program that is useful for the analysis of nucleotide 
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or amino acid sequences. "Sequence analysis soflware" may be 
commercially available or independently developed. Typical sequence 
analysis software will include but is not limited to the GCG suite of 
programs (Wisconsin Package Version 9.0. Genetics Computer Group 

5 (GCG). Madison. Wl). BLASTP, BLASTN. BLASTX (Altschul et al.. J. Mol. 
BioL 215:403-410 (1990). and DNASTAR (DNASTAR, Inc. 1228 S. Park 
St. Madison. Wl 53715 USA), and the FASTA program incorporating the 
Smith-Watemian algorithm (W. R. Pearson, Comput. MeUiods Genome 
Res., [Proc. Int Symp.] (1994), Meeting Date 1992. 111-20. Editor(s): 

10 Suhai. Sandor. Publisher. Plenum, New York, NY). Within the context of 
this application it will be understood that where sequence analysis 
software is used for analysis, that the results of the analysis win be based 
on the "default values" of the program referenced, unless othenwise 
spedfied. As used herein "default values" will mean any set of values or 

15 parameters which originally load with the software when first initialized. 

Standard recombinant DMA and molecular cloning techniques used 
here are well known in the art and are described by Sambrook, J., Fritsch, 
E. F. and Maniatis. T., Molecular Cloning: A Laboraton/ Manual . Second 
Edition. Cold Spring Harbor Laboratory Press, Cold Spring Harbor, NY 

20 (1 989) (hereinafter "Maniatis"); and by Silhavy. T. J., Bennan. M. L and 
Enqulst, L. W., Experiments with Gene Fusions. Cold Spring Harbor 
Laboratory Coki Press Spring Harbor, NY (1984); and by Ausubel, F. M. 
et al., Cuffent Protocols In Molecular Biology , published by Greene 
Publishing Assoc. and Wiley-lnterscience (1987). 

25 A variety of nucleotide sequences have been isolated firom 

Rhodococcus eryOimpolis AN1 2 strain encoding gene products involved in 
isoprenoid pathway. ORFs 1-5 for example encode enzymes eariy in 
isoprenoid pathway (Figure 1) leading to IPP which is the precursor of all 
isoprenoid compounds. ORF 6 and 7 encode IspA and CrtE enzymes, 
.-.^--—30 respectively, that are involved in the elongatton by condensing the IPP 
precursor. ORE'S 8-10 are involved more specifically in carotenoid 
productton. 

Comparison of the cftrs nucleotide base and deduced amino add 
sequences (ORF 1) to publk: databases reveals that the most similar 
35 known sequences range from a distant as about 70% identical to the 
amino acid sequence of reported herein over length of 648 amino acid 
using a Smith-Watemian alignment algorithm (W. R. Pearson. Comput 
Methods Genome Res., [Proc. Int. Symp.l (1994), Meeting Date 1992, 
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111-20. Editor(s): Suhai, Sandor. Publisher. Plenum, New York, NY). 
Preferred amino add fragments are at least atx>ut 70%-80% kientical to 
the sequences herein wherein 80%-90% identk^al is more preferred. Most 
preferred are nudete add fragments that are at least 95% identteal to the 
5 amino add fragments reported herein. Similarly, prefenred Dxs encoding 
nucleic add sequences conesponding to the instant ORPs are those 
encoding active proteins and which are at least 80% identical to the 
nucleic add sequences of reported herein. More preferred Dxs nudeic 
add fragments are at least 90% kientical to the sequences herein. Most 

10 preferred are Dxs nucleic add fragments that are at least 95% kientical to 
the nudeic add fragments reported her^n. 

Comparison of the Dxr base and deduced amino add sequence to 
public databases reveals that the most similar known sequence is 71% 
identnal at the amino add level over a length of 385 amino adds (ORF 2) 

IS using a Smith-Waterman alignment algorithm (W.R. Pearson supra). 
Preferred amino add fragments are at least about 70%-80% kientical to 
the sequences herein wherein 80%-90% identical is more prefened. Most 
prefen-ed are nudeic acid fragments that are at least 95% identical to the 
amino add fragments reported herein. Similarly, preferred Dxr encoding 

20 nudeic add sequences corresponding to the instant ORF are those 
encoding active proteins and which are at least 80% identical to tiie 
nucleic acid sequences of repotted herein. More preferred Dxr nudek; 
add fragmente are at least 90% identk»l to the sequences herein. Most 
preferred are Dxr nucleic add fragmente that are at least 95% identical to 

25 the nudeic add fragmente reported herein. 

Comparison of the YgbP (IspD) base and deduced amino add 
sequences to public datebases reveals that the most similar known 
sequences range from a distent as about 53% identical at the amino add 
level over a length of 232 amino adds (ORF 3) using a Smitii-Watemnan 
'^30 alignment algoritiim (W. R. Pearson supra). Preferred amino add 

fragmente are at least about 70%-80% kientical to the sequences herein 
wherein 80%-90% identical is more preferred. Most prefened are nudeic 
add fragmente that are at least 95% kientical to the amino add fragmente 
reported herein. Similarly, preferred YgbP (IspD) encoding nucleic acid 

3S sequences corresponding to the instont ORF are those encoding active 
proteins and whteh are at least 80% kientical to the nudeic add 
sequences of reported herein. More preferred YgbP (IspD) nudeic acid 
fragmente are at least 90% kienta'cal to the sequences herein. Most 
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preferred are YgbP (IspD) nudeic add fragments that are at least 95% 
identical to the nudeic add fragments reported herein. 

Comparison of the YchB (ispE) base and deduced amino add 
sequences to public databases reveals that the most similar known 

5 sequences range from a distant as about 62% identical at the amino acid 
level over a length of 31 1 amino acids (ORF 4) using a Smith-Watennan 
alignment algorithm (W. R. Pearson supra). Preferred amino acid 
fragments are at least about 70%-80% identical to the sequences herein 
wherein 80%-90% identical is more prefened. Most preferred are nucleic 

10 add fragments that are at least 95% identical to the amino add fragments 
reported herein. Similariy. preferred YchB (IspE) encoding nudeic add 
sequences corresponding to the instant ORF are those encoding active 
proteins and which are at least 80% identical to the nudeic acid 
sequences of reported herein. More prefened YchB (IspE) nudeic add 

IS fragments are at least 90% identical to the sequences herein. Most 
prefened are YchB (IspE) nudeic acid fragments that are at least 95% 
identical to the nucleic add fragments reported herein. 

Comparison of ttie YgbB (IspF) base and deduced amino add 
sequences to public databases reveals that the most similar known 

20 sequences range from a distant as about 57% klenta*cal at the amino add 
level over a lengtti of 158 amino adds (ORF 5) using a Smith-Watennan 
alignment algorithm (W. R. Pearson supra;. Prefened amino add 
fragments are at least about 70%-80% identical to the sequences herem 
wherein 80%-90% identical is more prefened. Most preferred are nudeic 

25 add firagments that are at least 95% identical to the amino add fragmerite 
reported herein. Similariy, prefened YgbB (IspF) encoding nucleic add 
sequences corresponding to tiie instant ORF are those encoding active 
proteins and which are at least 80% kientical to tiie nudeic add 
sequences of reported herein. More preferred YgbB (IspF) nucleic add 
— 30 fragments.are. at least 90% kientical to the sequences herein. Most 
prefened are YgbB (IspF) nudeic add fragments tiiat are at least 95% 
identical to the nucleic ackl fragments reported herein. 

Comparison of the IspA base and deduced amino add sequences 
to public datebases reveals that ttie most similar known sequences range 

35 from a distant as about 57% kientical at ttie amino add level over a length 
of 344 amino adds (ORF 6) xxsing a Smith-Watemian alignment algorithm 
(W. R. Pearson supra). Prefierred amino add fragments are at least about 
70%-80% kientical to the sequences herein wherein 80%-90% identical is 
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more preferred. Most preferred are nucleic acid fragments that are at least 
95% identical to the amino acid fragments reported herein. Similarly, 
prefened IspA encoding nucleic add sequences corresponding to the 
instant ORF are those encoding active proteins and which are at least 

5 80% identical to the nucleic add sequences of reported herein. More 
prefened IspA nudeic add fragments are at least 90% identical to the 
sequences herein. Most prefened are IspA nucleic add fragments that 
are at least 95% identical to the nucleic acid fragments reported herein. 
Comparison of the CrtE base and deduced amino add sequences 

10 to public databases reveals that the most similar known sequences range 
from a distent as about 41% identical at the amino acid level over a length 
of 378 amino adds (ORF 7) using a Smith-Watennan alignment algorithm 
(W. R. Pearson supra;. Preferred amino add fragmente are at least about 
70%-80% Identical to the sequences herein wherein 80%-90% identical is 

IS more prefenred. Most preferred are nucleic add fragments that are at feast ^ 
95% identical to the amino add fragments reported herein. Similarly, 
preferred CrtE encoding nudeic add sequences corresponding to the 
instant ORF are those encoding active proteins and which are at least 
80% identical to the nudeic add sequences of reported herein. More 

20 preferred CrtE nudeic acid fragments are at least 90% identical to the 
sequences herein. Most prefenred are CrtE nudeic acid fragments that 
are at feast 95% identical to the nudeic add fragmente reported herein. 

Comparison of the CrtB base and deduced amino add sequences 
to public databases reveals that the most similar known sequences range 

25 from a distant as about 47% identical at the amino add level over a length 
of 314 amino acWs (ORF 8) using a Smith-Watemian alignment algorithm 
(W. R. Pearson supra). Preferred amino add fragments are at least about 
70%-80% identical to the sequences herein wherein 80%-90% kJentical is 
more prefened. Most preferred are nudek: add fragmente that are at least 
" 30 95% identicaLto the amino add fragmente reported herein. Similariy, 
preferred nucleic add sequences corresponding to the instant ORF are 
those encoding active proteins and which are at least 80% identical to the 
nucleic acid sequences of reported herein. More prefened nudeic add 
fragmente are at least 90% kientical to the sequences herein. Most 

35 preferred are nudeto add fragmente that are at least 95% kientical to the 
nudeic add fragmente reported herein. 

Comparison of CrtI base and deduced amino add sequences to 
public databases reveals that the most similar known sequences range 
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from a distant as about 45% identicat at the amino acid level over a length 
of 530 amino adds (ORF 9) using a Smith-Vy^tennan alignment algorithm 
(W. R. Pearson supra). Prefened amino acid fragments are at least about 
70%-80% identical to the sequences herein wherein 80%-90% identical is 

5 more preferred. Most preferred are nucleic acid fragments that are at least 
95% identical to ttie amino add fragments reported herein. Similarly, 
preferred nudeic add sequences corresponding to the instant ORF are 
those encoding active proteins and which are at least 80% identical to the 
nudeic add sequences of reported herein. More preferred nudeic add 

10 fragments are at least 90% identical to the sequences herein. Most 
preferred are nudeic add firagments tiiat are at least 95% identical to the 
nudeic add fragments reported herein. 

Comparison of CrtL base and deduced amino acid sequences to 
public databases reveals that the most similar known sequences range 

15 from a distont as about 31 % identical at tiie amino acid level over a lengtti 
of 376 amino adds (ORF 10) using a Smith-Waterman alignment 
algorithm (W. R. Pearson supra). Preferred amino acid fragments are at 
least about 70%-80% Identical to ttie sequences herein wherein 80%-90% 
identical is more preferred. Most prefened are nudeic add firagments that 

20 are at least 95% identical to the amino add firagments reported herein. 
Similarly, piefenred nucleic add sequences conesponding to ttie instant 
ORF are tiiose encoding active proteins and which are at least 80% 
Identical to the nudeic add sequences of reported herein. More preferred 
nudeic add firagments are at least 90% identical to the sequences herein. 

25 Most preferred are nucleic add fragments that are at least 95% identical to 
the nudeic acid fragments reported herein. 

The nudeic add firagments of tiie instant invention may be used to 
isolate genes encoding homologous proteins from the same or other 
microbial species. Isolation of homologous genes using sequence- 
.i--^30 dependent pjptocbls is well Known in the art Examples of sequence- 
dependent protocols indude, but are not limited to. mettiods of nudeic 
add hybridization, and methods of DMA and RNA amplification as 
exemplified by various uses of nucleic add amplification technologies (e.g. 
polymerase chain reaction (PGR). Mullis et al.. U.S. Patent 4.683.202), 

35 ligase chain reaction (LCR). Tabor, S. et al.. Proc. Acad. Sci. USA 82. 
1074. (1985)) or strand displacement amplification (SDA, Walker, et al., 
Proc. Natl. Acad. Sci. U.S.A, 89, 392. (1992)). 
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For example, genes encoding similar proteins or polypetides to 
those of the instant invention could be isolated directly by using all or a 
portion of the instant nucleic add fragments as DNA hybridization probes 
to screen libraries from any desired bacteria using methodology well 

S known to those skilled in the art. Specific oligonucleotide probes based 
upon the instant nucleic acid sequences can be designed and synthesized 
by methods known in the art (Maniatis). Moreover, the entire sequences 
can be used directly to synthesize DNA probes by methods known to the 
skilled artisan such as random primers DNA labeling, nick translation, or 

10 end-labeling techniques, or RNA probes using available in vitro 

transcription systems. In addition, specific primers can be designed and 
used to amplify a part of or fiilHength of the instent sequences. The 
resulting amplification products can be labeled direcUy during amplification 
reactions or labeled after amplification reactions, and used as probes to 

IS isolate full length DNA fragments under conditions of appropriate 
stringency. 

Typically, in PCR-type amplification techniques, tiie primers have 
different sequences and are not complementary to each other. Depending 
on tiie desired test conditions, the sequences of tiie primers should be 

20 designed to provide for both efficient and falttiful replication of ttie target 
nucleic add. Methods of PGR primer design are common and well known 
in the art. (Thein and Wallace. The use of oligonudeotide as specific 
hybridization probes in the Diagnosis of Genetic Disorders', in Human 
Genetic Diseases: A Practical Approach, K. E. Davis Ed., (1986) pp. 33-50 

25 IRL Press, Hemdon. Virginia): Rychlik. W. (1993) In White. B. A. (ed.). 
Methods in Molecular Btoloov. Vol. 15. pages 31-39. PGR Protocols: 
Current Mettiods and Applications. Humania Press. Inc., Totowa, NJ) 
Generally two short segments of ttie instant sequences may be 
used in polymerase diain reaction protocols to amplify longer nudeic acid 

30 fragments-encoding homologous genes from DNA or RNA. The 

polymerase chain reaction may also be perfbnned on a library of doned 
nudeic add fragmente wherein tiie sequence of one primer is derived from 
ttie instent nucleic add firagments, and the sequence of the ottier primer 
takes advantage of the presence of the polyadenylic acid tracts to the 

35 3' end of the mRNA precursor encoding microbial genes. 

Alternatively, the second primer sequence may be based upon 
sequences derived from the doning vector. For example, tiie skilled 
artisan can follow the RACE protocol (Frohman et al.. PNAS USA 85:8998 
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(1988)) to geneiate cDNAs by using PGR to amplify copies of the region. 
t)etween a single point in the transcript and the 3' or 5' end. Primers 
oriented in the 3* and 5' directions can be designed from the instant 
sequences. Using commercially available 3' RACE or 5* RACE systems 

5 (BRL), specific 3' or 5' cDNA fragments can be isolated (Ohara et al., 
PNAS USA 86:5673 (1989); Loh et al.. Science 243:217 (1989)). 

Altematively the instant sequences may be employed as 
hybridization reagents for the identification of homoiogs. The basic 
components of a nucleic acid hybridization test include a probe, a sample 

10 suspected of containing the gene or gene fragment of interest, and a 

specific hybridization method. Probes of the present invention are typically 
single stranded nucleic add sequences which are complementary to the 
nucleic acid sequences to be detected. Probes are "hybridbsable" to the 
nudeic add sequence to be detected. The probe length can vary from 

15 5 bases to tens of thousands of bases, and will depend upon the specific 
test to be done. Typically a probe length of about 1 5 bases to about 
30 bases is suitable. Only part of the probe molecule need be 
complementary to the nudeic add sequence to be detected. In addition, 
the complementarity between the probe and the target sequence need not 

20 be perfect Hybridization does occur between imperfectly complementery 
molecules with the result that a certain fraction of the bases in the 
hybridized region are not paired with the proper comptementery base. 

Hybridization methods are well defined. Typically the probe and 
sample must be mixed under conditions which will pennit nudeic add 

25 hybridization. This involves contacting ttie probe and sample in the 
presence of an inorganic or organic salt under the proper concentration 
and temperature conditions. The probe and sample nudeic adds must be 
in contact for a long enough time that any possible hybridization between 
the probe and sample nucleic add may occur. The concentration of probe 
... 30 or target In the mixture will detennine the time necessary for hybridization 
to occur. The higher the probe or target concentration the shorter the 
hybridization incubation time needed. Optionally a chaotropic agent may 
be added. The chaotropic agent stabilizes nudeic adds by inhibiting 
nuclease activity. Furttiennore. the chaotropic agent allows sensitive and 

35 stringent hybridization of short oligonudeotide probes at room temperature 
(Van Ness and Chen (1991) Nucl. Adds Res. 19:5143-5151]. Suitable 
chaotropic agents indude guanidinium chloride, guanidinium thiocyanate. 
sodium thiocyanate. litt^ium tetrachloroacetate, sodium perchlorate. 
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rubidium tetrachloroacetate, poteissium iodide, and cesium trifluoroacetate. 
among others. Typically, the chaotropic agent will be present at a final 
concentration of about 3M. If desired, one can add fonnamide to the 
hybridization mixture, typically 30-50% (vA/). 

5 Various hybridization solutions can be employed. Typically, these 

comprise from about 20 to 60% volume, preferably 30%, of a polar organic 
solvent. A common hybridization solution employs about 30-50% v/v 
formamide. about 0.15 to 1M sodium chloride, about 0.05 to 0.1M buffers, 
such as sodium citrate, Tris-HCI, PIPES or HEPES (pH range about 6-9). 

10 about 0.05 to 0.2% detergent, such as sodium dodecyisulfate. or between 
0.5-20 mM EDTA, FICOLL (Phannacia Inc.) (about 300-500 kilodallDns). 
polyvinylpyrrolidone (about 250-500 kdal). and serum albumin. Also 
Included in the typical hybridization solution will be unlabeled earner 
nucleic adds from about 0.1 to 5 mg/mL. fragmented nucleic DMA, e.g.. 

15 calf thymus or salmon sperni DMA, or yeast RNA. and optionally from 
about 0.5 to 2% wt/vol. glycine. Other additives may also be Included, 
such as volume exclusion agents which include a variety of polar water- 
soluble or swellable agents, such as polyethylene glycol, anionic polymers 
such as polyacrylate or polymethylacrylate. and anionic saccharidic 

20 polymers, such as dextran sulfate. 

Nudeic add hybridization Is adaptable to a variety of assay fomnats. 

One of the most suitable Is the sandwich assay format The sandwich 
assay is particulariy adaptable to hybridization under non-denaturing 
conditions. A primary component of a sandwich-type assay Is a solid 
25 support The solid support has adsorised to it or covalenUy coupled to it 
immobilized nudeic add probe that is unlabeled and complementary to 
one portion of tiie sequence. 

Availability of ttie instant nudeotide and deduced amino acid 
sequences fadlitates immunological screening DNA expression libraries. 
■ "^ 30 Synthetic peptides representing portions of the instant amino add 

sequences may be synttiesized. These peptides can be used to immunize 
animals to produce polydonal or monoclonal antibodies with specificity for 
peptides or proteins comprising the amino add sequences. These 
antibodies can be then be used to screen DNA expression libraries to 
35 isolate full-length DNA dones of Interest (Lemer, R. A. Axiv. Immunol 36:1 
(1984); Manlatis). 

The genes and gene products of the instant sequences may be 
produced in heterologous host cells, particulariy in the cells of microbial 
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hosts. Expression in recombinant microbial hosts may be useful for the 
expression of various pathway intermediates; for the modulation of 
pathways already existing in the host for the synthesis of new products 
heretofore not possible using the host. 

5 Prefen^ed heterologous host cells for expression of the instant 

genes and nucleic acid fragments are microbial hosts that can be found 
broadly within the fungal or bacterial families and which grow over a wide 
range of temperature, pH values, and solvent tolerances. For example, it 
is contemplated that any of bacteria, yeast, and filamentous fungi will be 

10 suitable hosts for expression of ttie present nucleic acid fragments. 
Because of transcription, translation and the protein biosynthetic 
apparatus Is the same inespective of the cellular feedstock, functional 
genes are expressed irrespective of cari3on feedstock used to generate 
cellular biomass. Large-scale microbial growth and functional gene 

15 expression may utilize a wide range of simple or complex carbohydrates, 
organic adds and alcohols, saturated hydrocarbons such as methane or 
cari3on dioxide in the case of photosynthetic or chemoautotrophic hosts. 
However, the functional genes may be regulated, repressed or depressed 
by specific growth conditions, which may include the form and amount of 

20 nitrogen, phosphorous, sulfur, oxygen, cartoon or any trace micronutrient 
including small Inorganic Ions. In addition, the regulation of functional 
genes may be achieved by the presence or absence of specific regulatory 
molecules that are added to the culture and are riot typically considered 
nutrient or energy sources. Growth rate may also be an important 

25 regulatory factor in gene expression. Examples of host strains include but 
are not limited to bacterial, fungal or yeast species such as Aspergillus. 
Trichoderwa, Saccharomyces, Pichia, Candida, Hansenula, or bacterial 
species such as Salmonella, Bacillus, Acinetobacter, Zymomonas, 
Agrobacterium, Erythrobacter, Chlorobium, Chromatium, Flavobacterium, 
'^^ 30 Cytophaga, Rbodobacter, Rhodococcus, Streptomyces, Brevibacterium, 
Corynebacteria, Mycobacterium, Deinococcus, Escherichia, EnM'nia, 
Pantoea, Pseudomonas, Sphingomonas, Methylomonas, Methylobacter, 
Methylococcus, Methylosinus, Methylomicmbium, Methytocystis, 
Alcaligenes, Synechocystis, Synechococcus, Anabaena, Myxococcus, 

35 Thiobadllus, Methanoba(^rium and Klebsiella. 

Microbial expression systems and expression vectors containing 
regulatory sequences that direct h'gh level expression of foreign proteins 
are well known to those skilled in the art. Any of these could be used to 
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construct chimeric genes for production of the any of the gene products of 
the instant sequences. These chimeric genes could then be Introduced 
Into appropriate microorganisms via transformation to provide high level 
expression of the enzymes 

5 Accordingly it is expected, for example, that introduction of chimeric 

gene encoding the Instant bacterial enzymes under the control of the 
appropriate promoters, will demonstrate increased isoprenoid production. 
It is contemplated that it will be useful to express the instant genes both In 
natural host cells as well as heterologous host. Introduction of the present 

10 genes into native host will result in elevated levels of existing isoprenoid 
production. Additionally, the instant genes may also be introduced into 
non-native host bacteria where there are advantages to manipulate the 
isoprenoid compound production that are not present in Rhodococcus. 
Vectors or cassettes useful for the transfomnation of suitable host 

IS cells are well known in flie art. Typically ttie vector or cassette contains 
sequences directing transcription and translation of the relevant gene, a 
selectable maricer, and sequences allowing autonomous replication or 
chromosomal integration. Suitable vectors comprise a region 5' of the 
gene which harbors transcriptional initiation controls and a region 3' of the 

20 DNA fragment which controls transcriptional temnination. It is most 

preferred when botti control regions are derived from genes homologous 
to tiie transfonned host cell, aHhough it is to be understood that sUdi 
control regions need not be derived from the genes native to the specific 
species chosen as a production host. 

25 Initiation control regions or promoters, which are useful to drive 

expression of the instant ORFs in the desired host cell are numerous and 
fomiliar to ttiose skilled in the art Virtually any promoter capable of driving 
these genes is suitable for the present invention including but not limited to 
CYCf , H/S3, GAL1, GAL10. ADH1. PGK. PH05, GAPDH, ADCI, TRPI. 
' ^ 30 URA3, LEU2._EN0, TPI (useful for expression in Saccharomyces); A0X1 
(useful for expression in Pichia); and /ac, ara, tet, bp, IP^ IPr. T7, tac, and 
to (usefol for expression in Esf^mhia coh) as well as the amy, apr, npr 
promoters and various phage promoters useful for expression in Bacillus. 
Tennination control regions may also be derived from various 

35 genes native to the preferred hosts. Optionally, a termination site may be 
unnecessary, however, it is most preferred if included. 

Knowledge of the sequence of the present genes will be useful in 
manipulating tiie isoprenoid biosynthetic pathways in any organism having 
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such a pathway and particularly In methanotrophs. Methods of 
manipulating genetic pathways are common and well known in the art 
Selected genes In a particularly patiiway may be upregulated or down 
regulated by variety of methods. Additionally, competing pathways 
5 organism may be eliminated or sublimated by gene disruption and similar 
techniques. 

Once a key genetic patiiway has been identified and sequenced 
specific genes may be upregulated to increase the output of the pattiway. 
For example, additional copies of the targeted genes may be Introduced 

10 into the host celt on multicopy plasmkJs such as pBR322. Altematively tiie 
target genes may be modified so as to be under the control of non-native 
promoters. Where it is desired that a pathway operate at a particular point 
in a cell cycle or during a femnentation run, regulated or inducible 
promoters may used to replace the native promoter of the target gene. 

15 Similariy, in some cases the native or endogenous promoter may be 
modified to increase gene expression. For example, endogenous 
promoters can be altered in vivo by mutation, deletion, and/or substitution 
(see, Kmlec, U.S. Patent 5.565,350; Zarting et aL, PCT/US93/03868). 
Altematively it may be necessary to reduce or eliminate ttie 

20 expression of certain genes in the target pathway or in competing 
patiiways that may serve as competing sinks for energy or carbon. 
Methods of down-regulating genes for this purpose have been explored. 
Where sequence of the gene to be disrupted is known, one of the most 
effective methods for gene down regulation is targeted gene disruption 

25 where foreign DIMA is inserted into a structural gene so as to disrupt 
transcription. This can be effected by ttie creation of genetic cassettes 
comprising the DMA to be inserted (often a genetic mariner) flanked by 
sequence having a high degree of homology to a portion of the gene to be 
disrupted. Introduction of the cassette into the host cell results in insertion 
* 30 of tine foreign DMA into the structural gene via ttie native DMA replication 
mechanisms of the cell. (See for example Hamilton et al. (1989) J. 
BactBriol. 171:4617-4622. Balbas et al. (1993) Gene 136:211-213. 
Gueldener et al. (1996) Nucleic Acids Res. 24:2519-2524, and Smitti et al. 
(1 996) Methods Mot. Cell Bioi 5:270-277.) 

35 Antisense technology is another metiiod of down regulating genes 

where the sequence of the target gene is known. To accomplish this, a 
nucleic acid segment from tiie desired gene is cloned and operably linked 
to a promoter such that the anti-sense strand of RNA will be transcribed. 
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This construct is then introduced into the host cell and the anttsense strand 
of RNA is produced. Antisense RNA inhibits gene expression by 
preventing the accumulation of mRNA which encodes the protein of 
interest. The person skilled in the art will know that special considerations 
5 are associated with the use of antisense technologies in order to reduce 
expression of particular genes. For example, the proper level of 
expression of antisense genes may require the use of different chimeric 
genes utilizing different regulatory elements known to the skilled artisan. 
Although targeted gene disruption and antisense technology offer 

10 effective means of down regulating genes where the sequence is known, 
other less specific methodologies have been developed that are not 
sequence based. For example, cells may be exposed to a UV radiation 
and then screened for the desired phenotype. Mutagenesis with chemical 
agents is also effective for generating mutants^ and commonly used 

15 substances include chemicals that affect nonreplicating DNA such as 
HNO2 and NH2OH, as well as agents that affect replicating DNA such as 
acridine dyes, notable for causing frameshift mutations. Spedfic methods 
for creating nriutants using radiation or chemical agents are well 
documented in the art See for example Thomas D. Brock in 

20 Biotechnology: A Textbook of Industrial Microbiology. Second Edition 
(1989) Sinauer Associates, Inc.. Sunderiand. MA., or Deshpande, Mukund 
v., AppL Biochem. BiotechnoL. 36, 227, (1992). 

Another non-specific method of gene disruption is the use of 
transposoable elements or transposons. Transposons are genetic 

25 elements that insert randomly in DNA but can be latter retrieved on the 
basis of sequence to determine where the insertion has occuned. Both 
in vivo and in vitro transposition methods are known. Both methods involve 
the use of a transposable element in combination with a transposase 
enzyme. When the transposable element or transposon, is contacted \Arith 
• - 30 a nudeic acid Jragment in the presence of the transposase. the 

transposable element will randomly insert into the nucleic acki fragment 
The technique is useful for random mutageneis and for gene isolation, 
since the disrupted gene may be identified on the basis of the sequence of 
the transposable element Kits for in vitro transposition are commercially 

35 available (see for example The Primer Island Transposition Kit, available 
from Pericin Elmer Applied Biosystems, Branchburg, NJ, based upon the 
yeast Ty1 element; The Genome Priming System, available from New 
England Biolabs, Beverty, MA; based upon the bacterial transposon Tn7; 
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and the EZ::TN Transposon Insertion Systems, available from Epicentre 
Technologies. Madison. Wl. based upon theTnS bacterial transposable 
element. 

Within the context of the present invention it may be useful to 

5 modulate the expression of the identified isoprenoid pathway by any one of 
the above described methods. For example, the present invention 
provides a number of genes encoding key enzymes in the terpenoid 
pathway leading to the production of pigments and smaller isoprenoid 
compounds. The isolated genes include the dxs and dxr genes, the ispA, 

10 D, E, and F genes, the crtE, B, /. and L genes. In particular it may be 

useful to up-regulate the initial condensation of 3-carbons (pyruvate and CI 
aldehyde group. D-glyceraldehyde 3-Phosphate). to yield 5-carbon 
compound (D-1-deoxyxylulose-5-phosphate) mediated by the dxs gene. 
Alternatively, if it is desired to produce a specific non-pigment isoprenoid. it 

15 may be desirable to disrupt various genes at the downstream end of the 
pathway. For example, crti gene that is known to encode phytoene 
dehydrogenase that is a part of carotenokJ biosynthesis pathway. It may 
be desirable to use gene disruption or antisense inhibition of this gene if a 
smaller, upstream terpenoid is the desired product of the pathway. 

20 Where commercial production of the iosprenokl products of the 

present genes are desired a variety of culture methodologies may t>e 
applied. For example, large-scale production of a specific gene product, 
overexpressed from a recombinant microbial host may be produced by 
botti Batch or continuous culture methodologies. 

25 A classical batch culturing method is a closed system where the 

composition of the media is set at the beginning of the culture and not 
subject to artificial alterations during the culturing process. Thus, at the 
beginning of the culturing process the media is inoculated with the desired 
onanism or organisms and growth or metabolic activity is permitted to 
- 30 occur adding^oothing to the system. Typically, however, a "bateh" culture 
is batch with respect to the addition of carbon source and attempts are 
often made at controlling factors such as pH and oxygen concentration. In 
batch systems the metabolite and biomass compositions of tiie system 
change constantiy up to ttie time ttie culture is temiinated. Within batch 

35 cultures cells moderate through a static lag phase to a high growth log 
phase and finally to a stationary phase where growth rate is diminished or 
halted. If untreated, cells in ttie stationary phase will eventually die. Cells 
jn log phase are often responsible for the bulk of production of end product 
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or intermediate in some systems. Stationary or post-exponential phase 
production can be obtained in other systems. 

A variation on the standard batch system is the Fed-Batch system. 
Fed-Batch culture processes are also suitable in the present invention and 

5 comprise a typical batch system with the exception that tiie substrate is 
added in increments as the culture progresses. Fed-Batch systems are 
useful when catabolite repression is apt to inhibit the metabolism of the 
cells and where it is desirable to have limited amounts of substrate in the 
media. Measurement of the actual substrate concentration in Fed-Batch 

10 systems is difficult and is therefore estimated on ttie basis of ttie changes 
of measuRSible factors such as pH, dissolved oxygen and tiie partial 
pressure of waste gases such as CO2. Batch and Fed-Batch culturing 
methods are common and well known in the art and examples may be 
found in Thomas D. Brock in Biotechnology: A Textbook of Industrial 

15 MicrobioloQv, Second Edition (1989) Sinauer Associates, Inc., Sunderiand, 
MA., or Deshpande, Mukund V., Appl. Biochem. BiotechnoL, 36. 227, 
(1992), herein incorporated by reference. 

Commercial production of ttie products of the present genes may 
also be accomplished w'rth a continuous culture. Continuous cultures are 

20 an open system where a defined culture media is added continuously to a 
bioreactor and an equal amount of conditioned media is removed 
simultaneously for processing. Continuous cultures generally maintain the 
ceils at a constant high KquM phase density where cells are primarily in log 
phase growtti. AKematively continuous culture may be practiced with 

25 immobilized cells where cari)on and nutilents are continuously added, and 
valuable products, by-products or waste products are continuously 
removed from the cell mass. Ceil immobilization may be perfonned using 
a wide range of solid supports composed of natural and/or synthetic 
materials. 

' ^30 Continuous or semi-continuous culture allows for the modulation of 

one factor or any number of fiactors that affect cell growtti or end product 
concentration. For example, one method will maintain a limiting nutient 
such as the cari3on source or nitrogen level at a fixed rate and allow all 
other parameters to moderate. In other systems a number of factors 
35 affiecting growth can be altered continuously while the cell concentration, 
measured by media turi3idity, kept constant. Continuous systems strive 
to maintain steady state growtti conditions and thus the cell loss due to 
media being drawn off must be balanced against the cell growth rate in the 
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culture. Methods of modulating nutrients and growth factors for 
continuous culture processes as well as techniques for nrtaximizing the 
rate of product formation are well known in the art of industrial 
microbiology and a variety of methods are detailed by Brock, supra. 

5 Fennentation media in the present invention must contain suitable 

carbon substrates. Suitable substrates may include but are not limited to 
monosaccharides such as glucose and fructose, oligosaccharides such as 
lactose or sucrose, polysaccharides such as starch or cellulose or 
mixtures thereof and unpurified mixtures from renewable feedstocks such 

10 as cheese whey permeate, comsteep liquor, sugar beet molasses, and 
bariey malt Additionally the carbon substrate may also be one^^ari^on 
substrates such as cari3on dioxide, methane or methanol for which 
metabolic conversion into key biochemical intermediates has been 
demonstrated. In addition to one and two carbon substrates 

15 methylotrophic organisms are also known to utilize a number of other 
carbon containing compounds such as methylamine, glucosamine and a 
variety of amino acids for metabolic activity. For example, methylotrophic 
yeast are known to utilize the carbon from methylamine to form trehalose 
or glycerol (Bellion et al., Micmb. Growth CI Compd., [Int. Symp.]. 7th 

20 (1993), 415-32. Edrtor(s): Munell, J. Collin; Kelly, Don P. Publisher 
Intercept Andover, UK). Similarty, various species of Candida will 
metabolize alanine or oleic ackl (Suiter et al., Arch. Microbiol 153:485-489 
(1990)). Hence it is contemplated that the source of carit>on utilized in the 
present invention may encompass a wide variety of cari^on containing 

25 substrates and will only be limited by the choice of organism. 

Plants and algae are also known to produce isoprenoid compounds. 
The nucleic acid fragments of the instant invention may be used to create 
transgenic plants having the ability to express the microbial protein. 
Preferred plant hosts will be any variety that will support a high production 
"^30 level of the.instant proteins. Suitable green plants will include but are not 
limited to soybean, rapeseed (Brassica napus, B. campesbis), sunflower 
{Helianthus annus), cotton (Gossypium hirsutum), com, tobacco (Nicotiana 
tabacum). alfalfa {Medicago sativa), wheat (Triticum sp), bariey (Hordeum 
vulgare), oats {Avena sativa, L), sorghum {Sorghum bicolor), rice (Oryza 

35 sativa), Arabidopsis, cruciferous vegetables (broccoli, cauliflower, 
cabbage, parsnips, etc.). melons, can-ots, celery, parsley, tomatoes, 
potatoes, strawberries, peanuts, grapes, grass seed crops, sugar beets, 
sugar cane, beans, peas, rye, flax, hardwood trees, softwood trees, and 
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forage grasses. Algal species Include but not limited to commercially 
significant hosts such as Spimlina, Haemotacoccus, and Dunalliela. 
Overexpression of the isoprenoid compounds may be accomplished by 
first constructing chimeric genes of present invention in which the coding 

5 region are operably linked to promoters capable of directing expression of 
a gene in the desired tissues at the desired stage of development. For 
reasons of convenience, the chimeric genes may comprise promoter 
sequences and translation leader sequences derived from the same 
genes. 3' Non-coding sequences encoding transcription termination 

10 signals must also be provided. The instant chimeric genes may also 
comprise one or more introns in order to facilitate gene expression. 

Any combination of any promoter and any terminator capable of 
indudng expression of a coding region may be used in the chimeric 
genetic sequence. Some suitable examples of promoters and terminators 

IS include those from nopaline syntiiase (nos), octopine synthase (ocs) and 
cauliflower mosaic vims (CaMV) genes. Orie type of efficient plant 
promoter that may be used is a high level plant premioter. Such 
promoters, in operable linkage witii the genetic sequences or ttie present 
invention should be capable of promoting expression of the present gene 

20 product High level plant promoters that may be used in this invention 
include the promoter of the small subunit (ss) of the ribulose-1 ,5- 
bisphosphate carboxylase from example from soybean (Beny-Lowe et al.. 
J. Molecular and App. Gen., 1:483-498 1982)), and the promoter of the 
chlorophyll a/b binding protein. These two promoters are known to be 

23 light-induced in plant cells (see, for example. Genetic Enaineerino of 
Plants, an Agricultural Perspective . A. Cashmore, Plenum, NY (1983), 
pages 29-38; Comzzi, G. et al.. The Journal of Biological Chemistry, 
258:1399 (1983), and Dunsmuir, P. et al., Journal of Molecular and 
Applied Genebcs. 2:285 (1983)). 
^30 PlasmM vectors comprising the instant chimeric genes can tiien 

constructed. The choice of plasmid vector depends upon the meUiod that 
will be used to transfonn host plants. The skilled artisan is well aware of 
tiie genetic elements ttiat must be present on tiie plasmid vector in order 
to successfully transfomri, select and propagate host cells containing the 

35 chimeric gene. The skilled artisan will also recognize that different 
independent transfomiation events will result in different levels and 
patterns of expression (Jones et al., (1 985) EMBO J. 4:241 1-241 8; 
De Almeida et al.. (1989) MoL Gen. Genetics 2f 8:78-86), and ttius that 
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multiple events must be screened in order to obtain lines displaying the 
desired expression level and pattern. Such screening may be 
accomplished by Southern analysis of DNA blots (Southern, J. MoL BioL 
98, 503, (1975)). Northern analysis of mRNA expression (Kroczek, J. 

5 Chromatogn Biomed AppL, 618 (1-2) (1993) 133-145), Western analysis 
of protein expression, or phenotypic analysis. 

For some applications it will be useful to direct the instant proteins 
to different cellular compartments. It Is thus envisioned that the chimeric 
genes described above may be further supplemented by altering the 

10 coding sequences to encode enzymes with appropriate intracellular 
targeting sequences such as transit sequences (Keegstra, K., Cell 
56:247-253 (1989)), signal sequences or sequences encoding 
endoplasmic reticulum localization (Chrispeels, J J., Ann. Rev, Plant Phys. 
Plant MoL Biol. 42:21-53 (1991)), or nuclear localization signals (Raikhel, 

15 N. P/anfP/»ys.f 00:1627-1632 (1992)) added and/orwith targeting 

sequences that are already present removed. While the references cited 
give examples of each of these, the list is not exhaustive and more 
targeting signals of utility may be discovered in the future that are useful in 
the invention. 

20 It is contemplated that the present nucleotides may be used to 

produce gene products having enhanced or altered activity. Various 
methods are known for mutating a native gene sequence to produce a 
gene product witti altered or enhanced activity including but not limited to 
en^or prone PGR (Melnikov et ah. Nucleic Acids Research, (February 1 5, 

25 1999) Vol. 27, No. 4, pp. 1056-1062); site directed mutagenesis (Coombs 
etaL. Proteins (1998), 259-311, 1 plate. Editor(s): Angeletti, Ruth Hogue. 
Publisher Academic, San Diego, CA) and "gene shuffling" 
(U.S. 5.605,793; U.S. 5,811,238; U.S. 5.830,721; and U.S. 5.837.458, 
incorporated herein by reference). 
"30 Themeihod of gene shuffling is particularly attractive due to its 

facile implementation, and high rate of mutagenesis and ease of 
screening. The process of gene shuffling involves the restriction 
endonuclease cleavage of a gene of interest into fragments of specific size 
in the presence of additional populations of DNA regions of both similarity 

35 to or difference to the gene of interest This pool of fragments will then be 
denatured and reannealed to create a mutated gene. The mutated gene 
is then screened for altered activity. 
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The instant microbial sequences of the present invention may be 
mutated and screened for altered or enhanced activity by this method. 
The sequences should be double stranded and can be of various lengths 
ranging form 50 bp to 10 kb. The sequences may be randomly digested 
5 into fragments ranging from about 10 bp to 1000 bp, using restriction 
endonucleases well known in the art (Maniatis supra). In addition to the 
instant microbial sequences, populations of fragments that are 
hybridizable to all or portions of the microbial sequence may be added. 
Similarly, a population of fragments which are not hybridizable to the 

10 instant sequence may also be added. Typically these additional fragment 
populations are added in about a 10 to 20 fold excess by weight as 
compared to the total nucleic add. Generally if this process is followed the 
number of different specific nucleic acid fragments in the mixture will be 
about 100 to about 1000. The mixed population of random nucleic acid 

IS fragments are denatured to form single-stranded nucleic acid fragments 
and then reannealed. Only those single-stranded nucleic acid fragments 
having regions of homology with other single-stranded nucleic acid 
fragments will reanneal. The random nucleic acid fragments may be 
denatured by heating. One skilled in the art could detemnine the 

20 conditions necessary to completely denature the double stranded nucleic 
add. PrefenablytheteniperatureisfromSOXtolOOX. The nucleic add 
fragments may be reannealed by cooling. Preferably the temperature is 
from 20'*C to TS^'C. Renaturation can be accelerated by the addition of 
polyethylene glycol f PEG") or salt A suitable salt concentration may 

25 range from 0 mM to 200 mM. The annealed nudeic add fragments are 
ttien incubated in the presence of a nucleic acid polymerase and dNTP*s 
(i.e., dATP, dCTP, dGTP and dTTP). The nudeic acid polymerase may be 
the Klenow fragment, ttie Taq polymerase or any other DMA polymerase 
known in the art The polymerase may be added to the random nudeic 
--^30 add fragments prior to annealing, simultaneously witii annealing or after 
annealing. The cyde of denaturation, renaturation and incubation in ttie 
presence of polymerase is repeated for a desired number of times. 
Preferably tiie cyde is repeated from 2 to 50 times, more preferably tfie 
sequence is repeated from 10 to 40 times. The resulting nucleic acid is a 

35 larger double-stranded polynucleotide ranging from about 50 bp to about 
100 kb and may be screened for expression and altered activity by 
standard cloning and expression protocol. (Manatis supra), 

34 



wo 02/086094 PCT/US02/15033 

Furthermore^ a hybrid protein can be assembled by fusion of 
functional domains using the gene shuffling (exon shuffling) method 
(Nixon et ah, PNAS. 94:1069-1073 (1997)). The functional domain of the 
instant gene can be combined with the functional domain of other genes to 
5 create novel enzymes with desired catalytic function. A hybrid enzyme 
may be constructed using PCR overlap extension method and cloned into 
the various expression vectors using the techniques well known to those 
skilled in art. 

Description of the Preferred Embodiments 

10 The original environmental sample containing Rhodococcus 

erythropolis AN12 strain was obtained from wastewater treatment facility. 
One ml of activated sludge was inoculated directly into 10 ml of S12 
medium. Aniline was used as the sole source of cari3on and energy. The 
culture was maintained by addition of 100 ppm aniline every 2-3 days. 

15 The culture was diluted (1 :100 dilution) every 14 days. Bacteria that utilize 
aniline as a sole source of cariDon and energy were further isolated and 
purified on S12 agar. Aniline (5 pL) was placed on the interior of each 
culture dish lid. 

When 16$ rRNA gene of AN12 was sequenced and compared to 
20 other 16s rRNA sequence in the GenBank sequence database, 16s rRNA 
gene of AN12 strain has at least 98% similarity to the 16s rRNA gene 
sequences of high G+C gram positive Rhodococcus genus. 

Table 1 summarizes the 10 genes identified by genome sequencing 
from Rhodococcus erythropolis strain AN12 which are involved in the 
25 isoprenold pathway for carotenoids synthesis. The biochemical pathway 
for carotenoids synthesis and the putative assignment of the gene function 
is shown in Rgurel. 

Rhodoccoccus erythropolis AN12 is naturally pigmented. The 
pigment of AN12 was extracted and compared to the carotenoid pigment 
• '""30 of Rhodococcus erythropolis strain ATCC 47072. Pigments from both 

strains were extracted into acetone, dried under nitrogen, and re-dissolved 
in meUianol. Soluble materials from botti strains were analyzed by HPLC. 
The pigment from AN12 showed a similar profile as the carotenoid 
pigment from ATCC 47072 strain in HPLC analysis (Figure 2). The 
35 molecular weight of the major pigment in ATCC 47072 strain was 
detemiined to be 550 datton by MALDI-MS analysis and LC-MS. 

The d^s gene encodes the 1-deoxyxyluiose-5-phosphate synthase 
that catalyzes the first step of the synthesis of 1-deoxyxyiulose-5- 
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phosphate from glyceraldehyde-3-phosphate and pyruvate precursors in 
the isoprenoid pathway. \Nhen dxs genes with diffierent DNA lengths of 
upstream promoter regions from AN12 were cloned into the multicopy 
shuttle vector, electroporated into ATCC 47072 host and overexpressed, 

S transfonmed colonies appeared dartcer than the colonies with vector 
control. Carotenoid production in the transfonned colonies was evaluated 
spectrophotometrically and using HPLC. Increased carotenoid production 
was observed in transfonned colonies (Table 2). 

The activity of the present genes and gene products has been 

10 confinned by a study showing the loss of carotenoid production in ATCC 
47072 strain when the gene was disrupted by homologous recombination. 
Targeted genes were crtE and crtl. Truncated portions of crtE and crti 
genes from ATCC 47072 strain were amplified using PGR. The primer 
sequences for PCR were based on AN12 sequence. The amplified 

IS fragments of crtE and crti genes had about 95% identity on the DNA level 
to the respective genes from AN12 strain. The crtE fragment and the crtl 
fragment were first cloned into pCR2.1 TOPO vector (Invitrogen. Cartsbad. 
CA). The TOPO dones were digested with Ncol and the crtE or crtl 
fragments were subsequently cloned into the Ncol site of pBR328. The 

20 resulted constaicts were conflmned by sequencing and designated as 
pDQQIOO for the citE done and pDCQIOI for the crtl done. 
Approximately one pg DNA of pDCQlOO and pDCQ101 were introduced 
into Rhodococcus ATCC 47072 by electroporation and plated on NBYE 
plates with 10 pg/ml tetracydine. The pBR328 vedor does not replicate in 

25 Rhodococcus. The tetracydine resistant transformants obtained after 
3-4 days of incubation at 30°C were generated by chromosomal 
integration. Integration into the tergeted crfE or crtl gene on chromosome 
of ATCC 47072 was confirmed by PCR. The vedor spedfic primers 
paired with the gene spedfic primers were used for PCR using 
-^30 chromosoinaJpNA prepared finom the tetracydine resistent transformante 
as the templates. PCR fragments of the expected sizes were amplified 
from the tetracydine resistant transfonnants. but no PCR produd was 
obteined fifom the wild type ATCC 47072. When the two gene specific 
primers were used, no PCR fragment was obtained with the tetracycline 

35 resistent transfonrtant due to the insertion of the la^e vedor DNA. The 
PCR fragments obteined with the vedor specific primers and the gene 
specific primers were sequenced. Sequence analysis of the junction of the 
vedor and the crtE or crtl gene confimrted that the single crossover 
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recombination occurred at the expected sites and disrupted the terget 

genes crfE or erf/. 

The phenotypes of the CrtE and CrtI disruption mutants of 

ATCC 47072 were analyzed. Colonies of CrtE or CrtI disruption mutants 
5 were pale white. It appeared that the pigments present in the wild type 

strain were lost in both mutants. HPLC analysis of the carotenoids of the 

mutants confirmed the visual inspection result. 

The CrtI disruption mutant did not have the two HPLC peaks 

present in the wild type strains when monitored at 450 nm. (Table 3) 
10 These results confimied the role of CrtI protein in carotenoids 

blosynthe^s. Knockout of the erf/ gene resulted in no carotenoid pigment 

as represented by the two HPLC peaks at 450 nm. Phytoene (colortess) 

accumulation in the CrtI disruptkm mutent confimis the function of CrtI 

protein as the phytoene dehydogenase as suggested by the BLAST 
IS search. 

The CrtE disruption mutant had neither the two HPLC peaks 
present in the wild type nor the phytoene peak in the CrtI disruption 
mutent These resulte also confimied the role of CrtE protein in 
carotenokis biosynthesis. No phytoene accumulation in CrtE disruption 

20 mutent was consistent with the function of CrtE protein as geranylgeranyl 
pyrophosphate synthase, which acte prior to the phytoene synthesis step 
in the pathway. 

The lycopene cyclase (ORF 10) identified in Rhodococcus 
erythropolis strain AN12 showed high sequence similarity to the CrtL-type 

25 of lycopene cyclases in plants and cyanobacterium (Table 1). The tri-alkyi 
amine compounds, 2-(4-methylphenoxy)-triethylamine hydrochloride 
(MPTA) and 2-(4-chlorophenylthio)-triethylamine hydrochloride (CPTA), 
have been shown to spedfically inhibit the CrtL-type of lycopene cyclases 
and not the non-photosynthetic bacterial CrtY-type of lycopene cyclases 
--'"^30 (Cunninghara.jJr.. et al. Molecular staicture and enzymatic function of 
lycopene cyclase from the Cyanobacterium Synechococcus sp. strain 
PCC7942. The Plant Cell, 1994. Vol.6:1107). The effect of MPTA or 
CPTA on carotenoid production in Rhodococcus erythropolis 
(ATCC 47072 strain) was examined. In the presence of 40 ^iM of MPTA 

35 or CPTA, carotenoid production was significantly decreased using 
lycopene as a substrate. 
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EXAMPLES 

The present invention is further defined in the following Examples. 
It should be understood that these Examples, while indicating prefen-ed 
embodiments of the invention, are given by way of illustration only. From 
S the above discussion and these Examples, one skilled in flie art can 
ascertain the essential characteristics of this invention, and without 
departing from the spirit and scope thereof, can make various changes 
and modifications of the Invention to adapt it to various usages and 
conditions. 

10 GENERAL METHODS 

Standard recombinant DNA and molecular cloning tediniques used 
in the Examples are well known in the art and are described by Sambrook, 
J.. Fritsch, E. F. and Maniatis. T. Molecular Cloning: A Laboratory Manuat, 
Cokl Spring Hari3or Laboratory Press: Cold Spring Hartior. (1989) 

15 (Maniatis) and by T. J. Silhavy. M. L Bennan, and L W. Enquist, 
Experiments witti Gene Fusions. Coki Spring Harix)r Laboratory. Cold 
Spring Hart}or, NY (1 984) and by Ausubel. F. M. et al.. Current Protocds 
in Molecular Btology. pub. by Greene Publishing Assoc. and Wiley- 
Intersdence (1987). 

20 Materials and methods suitable for ttie maintenance and growth of 

bacterial cultures are well known in ttie ait Technkiues suitable for use In 
the folfowing examples may be found as set out in Manual of Methods for 
General Bacterioloov (Phillipp Gertiardt, R. G. E. Murray. Ralph N. 
Costilow, Eugene W. Nester. Willis A. Wood, Noel R. Krieg and G. Briggs 

25 Phillips, eds). American Society for Microbiology. Washington. DC. (1 994)) 
or by Thomas D. Brock in Biotechnoloov: A Textbook of Industrial 
Microbiology. Second Editi'on. Sinauer Associates. Inc.. Sunderiand, MA 
(1989). All reagents, restriction enzymes and materials used for the 
growth and maintenance of bacterial cells were obteined from Aldrich 
• -^30 ChemtoalsiMilwaukee. Wl). DIFCO Laboratories (DetaDit. Ml). 

GIBCO/BRL (Gaittiersburg. MD). or Sigma Chemical Company (St. Louis. 
MO) unless othenwise specified. 

The meaning of abbreviations is as foltows: 'h" means hour(s). 
"min" means mlnute(s), "sec" means second(s), "d" means day(s), "ml" 

35 means milliliters, "L" means liters. 
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EXAMPLE 1 
Isolation and Characterization of Strain AN12 
Example 1 describes the isolation of strain AN12 of Rhodococcus 
erythropolis on the basis of being able to grow on aniline as the sole 

5 source of carbon and energy. Analysis of a 1 6S rRNA gene sequence 
Indicated that strain AN12 was related to high G •<- C Gram positive 
bacteria belonging to the genus Rhodococcus. 

Bacteria that grew on aniline were isolated from an enrichment 
culture. The enrichment culture was established by inoculating 1 ml of 

10 activated sludge into 10 ml of S12 medium (10 niM ammonium sulfate, 
50 mM potassium phosphate buffer (pH 7.0). 2 mM MgCl2, 0.7 mM CaCl2. 
50 MnCl2. 1 nM FeCIs, 1 nM ZnCIa, 1.72 jiM CUSO4. 2.53 ^M C0CI2. 
2.42 \iM Na2Mo02. and 0.0001% FeS04) in a 125 ml screw cap 
Erienmeyer flask. The activated sludge was obtained from a wastewater 

IS treatment facility. The enrichment culture was supplemented with 

100 ppm aniline added directly to the culture medium and was incubated 
at 25''C with reciprocal shaking. The enrichment culture was maintained 
by adding 100 ppm of aniline every 2-3 days. The culture was diluted 
every 14 days by replacing 9.9 ml of the culture with the same volume of 

20 S1 2 medium. Bacteria that utilized aniline as a sote source of carixtn and 
energy were isolated by spreading samples of the enrichment culture onto 
S12 agar. Aniline (5 pL) was placed on the interior of each petri dish lid. 
The petri dishes were sealed with parafilm and incubated upside down at 
room temperature (approximately 25"C). Representative bacterial 

25 cobnies were then tested for the ability to use aniline as a sole source of 
carbon and energy. Cobnies were transferred from the original SI 2 agar 
plates used for initial isolation to new S12 agar plates and supplied with 
aniline on the interior of each petri dish lid. The petri dishes were sealed 
with parafilm and incubated upskle down at room temperature 
~30 (approximately 25X). 

The 16S rRNA genes of each isolate were amplified by PCR and 
analyzed as follows. Each isolate was grown on R2A agar (Difco 
Laboratories. Bedford, MA). Several colonies firom a culture plate were 
suspended in 100 ^1 of water. The mixture was frozen and then thawed 

35 once. The 16S rRNA gene sequences were amplified by PCR using a 
commercial kit according to the manufocturer's instructions (Peri(tn Elmer) 
with primers HK12 (S'-GAGnTGATCCTGCCTCAG-S*) (SEQ ID N0:21) 
and HK13 (5'-TACCrrGTTACGACTr-3') (SEQ ID NO:22). PCR was 
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performed in a Perkin Elmer GeneAmp 9600 (Norwalk, CT). The samples 
were Incubated for 5 min at 94''C and then cycled 35 times at 94*^0 for 
30 sec. 55X for 1 min. and 72X for 1 min. The amplified 16S rRNA 
genes were purified using a commerciai kit according to the 
5 manufacturer's Instructions (QIAquick PGR Purification Kit. Qiagen. 
Valenda, CA) and sequenced on an automated ABI sequencer. The 
sequencing reactions were initiated with primers HK12. HK13, and HK14 
(5'-GTGCCAGCAGYMGCGGT-3') (SEQ ID NO:23, where Y=C or T. M=A 
or C). The 16S rRNA gene sequence of each isolate was used as the 

10 query sequence for a BLAST search [Attschul, et al.. Nucleic Acids Res. 
25:3389-3402(1 997)] of GenBank for similar sequences. 

A 16S rRNA gene of strain AN12 was sequenced and compared to 
other 16S rRNA sequences in the GenBank sequence database. The 16S 
rRNA gene sequence from strain AN12 was about 98% similar to the 16S 

IS rRNA gene sequences of high G C Gram positive bacteria belonging to 
the genus Rliodococcus. 

EXAMPLE 2 

Preparation of AN 12 Genomic DNA for Seouendna and Sequence 

Generation 

20 Genomic DNA preparation . RhodococcuserythropolisANMyNas 

grown in 25 mL NBYE medium (0.8% nutrient broth, 0.5% yeast extract, 
0.05% Tween 80) till mid-log phase at 37''C with aeration. Bacterial cells 
were centrifiiged at 4,000 g for 30 min at 4''C. The cell pellet was washed 
once with 20 ml 50 mM Na2C03 containinglM KCI (pH 10) and then with 

25 20 ml 50 mM NaOAc (pH 5). The cell pellet was gentiy resuspended in 
5 ml of 50 mM Tris-10 mM EDTA (pH 8) and lysozyme was added to a 
final concentration of 2 mg/mL The suspension was incubated at ZT'C for 
2 h. Sodium dodecyl sulfate was then added to a final concentration of 
1% and proteinase K was added to 100 ^g/ml final concentration. The 

30 suspension, was incubated at SS^C for 5 h. The suspension became clear 
and the clear lysate was extracted with equal volume of 
phenol:chlorofonn:isoamyl alcohol (25:24:1). After centrifuging at 17,000 g 
for 20 min, the aqueous phase was carefully removed and transferred to a 
new tube. Two volumes of ettianol were added and the DNA was gentiy 

35 spooled with a sealed glass pasteur pipet. The DNA was dipped into a 
tube containing 70% ettianol, ttien air dried. After air drying, DNA was 
resuspended in 400 pi of TE (10 mM Tris-1 mM EDTA, pH 8) witti RNaseA 
(100 [iglmL) and stored at4X. 
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Library construction . 200 to 500 jig of chromosomal DMA was 
resuspended in a solution of 300 mM sodium acetate. 10 mM Tris-HCI, 
1 mM Na-EDTA, and 30% glycerol, and sheared at 12 psi for 60 sec in an 
Aeromist Downdraft Nebulizer chamber (IBI Medical products, Chicago, 

S IL). The DNA was precipitated, resuspended and treated with Bai31 
nuclease (New England Biolabs. Beverly. MA). After size fractionation by 
0.8% agarose gel electrophoresis, a fraction (2.0 kb, or 5.0 kb) was 
excised, cleaned and a two-step ligation procedure was used to produce a 
high titer library with greater than 99% single inserts. 

10 Sequencing - A shotgun sequencing strategy approach was 

adopted for the sequencing of the whole microbial genome (Retschmann, 
Robert et aL, Whole-Genome Random sequencing and assembly of 
Haemophilus influenzae Rd Sc/ence, 269: 1 995). 

Sequence was generated on an ABI Automatic sequencer using 

15 dye tenninator technology (U.S. 5366860; BP 272007) using a 

combination of vector and insert-specific primers. Sequence editing was 
perfomned In either DNAStar (DNA Star Inc., Madison, Wl) or the 
Wisconsin GCG program (Wisconsin Package Version 9.0, Genetics 
Computer Group (GCG), Madison, Wl) and the CONSED package 

20 (version 7.0). All sequences represent coverage at least two times in both 
directions. 

EXAMPLE 3 

Identification of ORFs in the Isoorenoid Pathwav from Statin AN12 
ORFs 1-10 were identified by conducting BLAST (Basic Local 
25 Alignment Search Tool; Altschul. S. F., et al., (1993) J. Mol Biol. 
21 5:403^10; see also www.ncbi.nlm.nih.gov/BLAST/) searches for 
similarity to sequences contained in the BLAST "nr" database (comprising 
all non-redundant (nr) GenBank CDS translations, sequences derived from 
the 3-dimensional stnidure Brookhaven Protein Data Bank, the SWISS- 
"30 PROT protein sequence database, EMBL, and DDBJ databases). The 
sequences obtained in Example 2 were analyzed for similarity to all 
publicly available DNA sequences contained in the "nr" database using the 
BLASTN algorittim provkled by the National Center for Biotechnology 
Information (NCBI). The DNA sequences were translated in all reading 
35 frames and compared for similarity to all publicly available protein 

sequences contained in the ''nr' database using the BLASTX algorithm 
(/Mtschul, S. F., et aL, Nucleic Acid Res. 25:3389-3402) (1997) provkled by 
tiie NCBI. The results of ttie BLAST comparison is given in Table 1 which 
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summarize the sequences to which they have the most similarities. 
Table 1 displays data based on the BLAST algorithm with values reported 
in expect values. The Expect value estimates the statistical significance of 
the match, specifying the number of matches, with a given score, that are 
S expected in a search of a database of this size absolutely by chance. 
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Table 1 summarizes the ten genes we identified by genome 
sequencing from Rhodococcus erythropolis strain AN12 which are 
involved in the isoprenoid pathway for carotenoids synthesis. The 

S biochemical pathway for carotenoids synthesis and the putative 
assignment of the gene function is shown in Figure 1. 

The top hits fix>m the BLAST search for 0RF3 and 0RF5 were 
originally annotated as hypothetical proteins from Mycobacterium 
tubercuto&s. The genes encoding these two hypothetical proteins were 

10 linked in the Mycobatiterium chromosome. The upstream gene Rv3582c 
encoding the protein with homology to ORF 3 was later identified as a 
homolog of ygbP (/spD) encoding 4<liphosphocytidyl-2C-methyl-D- 
erythritol synthase (Rohdich, et al, 1999, PNAS 96:1 1758). The 
downstream gene Rv3581c encoding the protein with homology to ORF 5 

15 was later identified as a homolog cXygbE {ispf) encoding 2C-methyl-D- 
erythritol 2,4-cyclodiphosphate synthase (Herz. et al, 2000, PNAS 
97:2486). The ORF 3 and ORF 5 are also closely adjacent on the 
chromosome of Rhodococcus strain AN12 with the same organization as 
the ygbP and ygbB homologs in M. tuberculosis, £ co//. H. influenzae and 

20 B. subtilis (Rohdich, et al, 1999. PNAS 96:1 1758). Two other genes criB 
(ORR) and crd (ORF9) are also linked on AN12 chromosome. 

ORF 10 had homology to p-tycopene cyclases.that add p-cydic 
groups to the ends of the tycopene substrate. There are two classes of p- 
lycopene cyclases that are funcb'onally very similar, the crtL-type of 

25 cyclases from cyanobaderium and plants, and the crtY-type of cyclases 
from other bacteria. Despite the functional similarity, these two classes of 
cyclases shared limited structural similarities. ORF 10 showed highest 
similarity to lycopene cyclase from De/nococcas/Bd/oddura^ The 
lycopene cydases from Rhodococcus erythropolis strain AN12 and 

30 Deinococcusradiodurans strain R1 all showed higher homology to plant 
crtL4) type of lycopene cydases than the bacterial c/f V-type of lycopene 
cydases. 

EXAMPLE4 

Carotenoid Pigments Produced by Rhodococcus Strains 
35 Rhodococcus erythropolis strains ATCC 47072 and AN1 2 are 

naturally pigmented. The pink color of the two strains indicates producUon 
of carotenoid pigments in these two strains. The carotenoid pigments in 
ATCC 47072 and AN12 were extracted and analyzed by HPLC. For each 
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Rhodoccocus strain. 100 ml of cell culture In NBYE (0.8% nutrient broth + 
0.5% yeast extract) were grown at 26'C overnight with shaking to the 
stationary phase. Cells were spun down at 4000 g for 15 min, and the cell 
pellets were resuspended In 10 ml acetone. Carotenolds were extracted 

5 Into acetone with constant shaking at room temperature. After 1 hour, the 
cells were spun down at the same condition as above and the supematarit 
was collected. The extraction was repeated once, and the supematants of 
both extractions were combined and dried under nitrogen. The dried 
material was re-dissolved in 0.5 ml methanol and insoluble material was 

10 removed by centrifugation at 16,000 g for 2 mln in an Eppendorf 

microcentrifuge 5415C. 0.1 ml of the sample was used for HPLC analysis. 

A Beckman System Gold® HPLC with Beckman Gold Nouveau 
Software (Columbia, MD) was used for the study. 0.1 ml of the crude 
acetone extractfon was loaded onto a 125 x 4 mm RP8 (5 pm particles) 

15 column with corresponding guard column (Hewlett-Packard, San 
Fernando, CA). The flow rate was 1 ml/min. Solvent program is: 
0-11.5 min 40% water/60% methanol, 11.5-20 min 100% methanol, 
20-30 min 40% water/60% methanol. The spectrum data were collected 
by the Beckman photodiode array detector (model 168). 

20 The Rhodococcus strains ATCC 47072 and AN12 showed very 

similar profiles of the carotenokl pigments (Figure 2) by HPLC analysis. 
They both had a major HPLC peak with an elutton time of 14.6 min when 
monitored at 450 nm. The absorption maximum of the major peak Is 
465 nm. A minor peak was also present in both strains with an elution 

25 timeof 15.6 min. The absorption maxima of the minor peak are 435 nm, 
458 nm. and 486 nm. These data Indicate the presence of similar or 
identical carotenolds in these two Rhodococcus strains. The molecular 
weight of the major and the minor carotenokis in these two strains was 
also determined. Carotenolds were extracted into methanol from the cell 

30 pellet and^ponlfled with 5% KOH in methanol overnight at room 
temperature. After saponifteatnn, the majority of carotenoids were 
extracted into hexane. The extracted sample was first passed through a 
siltoa gel column to separate fifom neutral iipkls. The column (1 .5 cm x 
20 cm) was packed with silica gel 60 (particle size 0.040-0.063mm, EM 

35 Science. GIbbstown. NJ) and washed with hexane. The carotenoids 
sample was loaded, washed witti 95%hexane + 5% acetone and eluted 
with 80%hexane +20% acetone. The eluted carotenoids were further 
separated on a reverse phase CI 8 thin layer chromatography (TIC) plate 
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(J. T. Baker. Phillipsburg. NJ) with 80% acetonitrile +20% acetone as the 
mobile phase. The major carotenoid band (Rf 0.5) was excised and eiuted 
with acetone. The molecular weight (MW) of the purified major carotenoid 
peak of ATCC 47072 was detemiined by MALDI-MS to be 550 Dalton. 

5 This was confimied by LC-MS with APCI (atmospheric pressure chemical 
ionization) that showed the MW of the protonated compound to be 551 
Dalton. LC/MS also showed the molecular weight of the minor peak 
carotenokt of ATCC 47072 to be 536 dalton (537 dalton for the protonated 
fomn). Mass spectrometry analysis ofcarotenoids from AN12 showed that 

10 the molecular weight of the major peak carotenoki (550 dalton) and the 
minor peak carotenokl (536 dalton) of AN12 were klentical to those of 
ATCC 47072. Based on the HPLC result, the spectrum analysis and the 
molecular weight detemiination, it is likely that carotenoids produced by 
AN12 and ATCC 47072 are identical and the genes involved in the 

IS carotenoids production are homologous. The structures of the carotenoids 
have not yet been determined. 

EXAMPLE 5 

Increased Carote noids Production With MulticoDV Expression of Dxs 
The d^s gene encodes the 1-<leoxyxylulose-5-phosphate synthase 

20 that catalyzes the first step of the synthesis of 1 -deoxyxylulose-5- 

phosphate from glyceraklehyde^hosphate and pyruvate precursors in 
the isoprenoid pathway. An effort was made to express the putative dxs 
* gene from AN12 on a multicopy shuttle vector and detentiine the effect of 
the dxs expression on the carotenoids production. The dxs gene with its 

25 native promoter was amplified from Rhodococcus AN 1 2 strain by PCR. 
Two upstream primers. New dxs 5' primer 5'-ATT TCG TTG AAC GGC 
TCG C&-3' (SEQ ID NO:24) and N6w2 dxs 5* primer 5*-CGG CAA TCC 
GAC CTC TAC CA-3* (SEQ ID NO:25). were designed to include the 
native promoter regton of dxs with different lengths. The downstream 

30 primer, New.dxs 3' primer S'-TGA GAC GAG CCG GCC TT-3 (SEQ 
ID NO:26y included the underiined stop codon of the dxs gene. PCR 
amplification of AN12 totel DNA using New dxs 5' + New dxs 3' yielded 
one product of 2519 bp in size, which included the full length AN12 dxs 
coding region and about 500 bp of immediate upstream region (nt #500 • 

35 #301 9). When using New2 dxs 5' + New dxs 3' primer pair, the PCR 
product is 2985 bp in size, including Oie complete AN12 dxs gene and 
about 1 kb upstream region (nt #34 - #3019). Both PCR products were 
first cloned in the pCR2. 1-TOPO cloning vector according to 
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manufecturer's instruction (invitrogen, Carlsbad, CA). Resulting clones 
were screened and sequenced. The confirmed plasmids were digested 
with EcoRI and the 2.5 kb and 3:0 kb fragments containing the dxs and the 
upstream region from each plasmid were treated with the Klenow enzyme 

5 and cloned into the unique Ssp I site in the E. co//- Rhodococcus shuttle 
plasmid pRhBR171 (CL1709). The resulting constructs pDCQ22 (clones 
#4 and #7) and pDCQ23 (clones #10 and #1 1) were electroporated into 
Rhodococcus erythropolis ATCC 47072 with tetracycline 10 vgfml 
selection. The pigment off the l^odococcus transfbnmants appeared 

10 darker comparing to the vector control To quantify the carotenoid 

production of each Rhodococcus strain, 1 ml of fresh cultured cells were 
added to 200 ml fresh LB medium with 0,05% Tween-80 and 10 pg/ml 
tetracycline, and grew at SO^'C for 3 days to stationary phase. Cells were 
pelleted by spinning at 4000 g for 1 5 min and the wet weight was 

IS measured for each cell pellet Carotenoids were extracted from the cell 
pellets into 10 ml acetone overnight with shaking and quantitated at the 
absorbance maximum (465 nm) of the major carotenoid of ATCC 47072 
spectrophotometrically. The absorption indicating the amount of 
carotenokJs produced was normalized in each strain based on the cell 

20 paste weight or the ceil density (OD600). Carotenokis production 
calculated by either method showed about 1 .6-fbld increase in 
ATCC 47072 with pDCQ22, which contains the dxs with the shorter 
promoter regton. Carotenoid production increased even more (2.2-fold) 
when dxs was expressed with the longer promoter region. It is likely that 

25 the 1 kb upstream DMA contains the promoter and some elements for 
enhancement of the expression. HPLC analysis also verified that the 
same carotenoids were produced in the dxs expression strain as those of 
the wild type strain. 
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Table 2. Carotenoids production by Rhodococcus strains. 



strain 


OD6D0 


weight (9) 


OD465 




%^) 


% (OD600) 

c 


%(avg) 

a 


ATCC 47072 
{pRhBR171) 


1.992 


2.82 


0.41 


100 


100 


100 


100 


ATCC (pDCQ22)#4 


1.93 


2.9 


0.642 


157 


161 


152 


156 


ATCC(pDCQ22)#7 


1.922 


2.76 


0.664. 


162 


159 


156 


157 


ATCC(pDCQ23)#1 
0 


1.99 


2.58 


0.958 


234 


214 


233 


224 


ATCC(pDCQ23)#1 
1 


1.994 


2.56 


0.979 


239 


217 


239 


228 



^ %ofcarotenou, , , . 

c % of carotenoid production (OD465nm) normalized with ceO density (OD600nm). 

% of carotenoid production (OD465nm) averaged linom tlie normalizations with wet cell 
paste we^ht and cell d«islty. 



EXAMPLES 

10 Loss of Carotenoid Pigment in the Rhodococcus CrtE or CrtI Mutant 
To confirm the functions of some of the genes listed in Table 1 for 
parotenoid biosynthesis, gene disniption mutants of crtE and cf(/ were 
constructed by homologous recombination. The targeted gene disruption 
scheme is shown in Rgure 3 using crtf as an example. PCR primers 

15 designed based on the crtE and cr0 sequences of AN1 2 were used to 
amplify internal fragments of crtE and crtI from ATCC 47072. The primers 
Alsi12_E F (5'-CAT GCCATGGC CTCGAAGCCTTCGTCGTG-3') (SEQID 
NO:27) and AN12_E_R (5'- 

CATGCCAIGGCGCAGAGTGTCGACTTCGTT-3') (SEQ ID NO:28) 
20 amplified 801 bp cftEwith 179 bp truncation at N tenninal and 160 bp 
truncation at C temninaL The primers AN 12 J_F (5- 
TTCAT GCCATGGA CTCGTCGAAGACGCTCTTG-3') (SEQ ID NO:29) 
and AN12J_R (g-TTCAT GCCATGGT GACGAGCAGTGACGGAT-S') 
(SEQ ID NO:30) amplified 910 bp crtI with 221 bp truncation at N tenminal 
25 and 462 bp truncation at CtenninaL The crtE and crt/ fragments amplified 
from ATCC 47072 were confirmed by sequencing and showed about 95% 
identity on the DNA level to the crtE and erf/ of AN12. The crtE fragment 
and the crtI fragment were first cloned into pCR2.1 TOPO vector 
(Invitrogen, Cartsbad, CA). The TOPO clones were then digested with 
30 Ncol (restriction sites underiined in the primer sequences) and the crtE or 
crf/firagments were subsequently doned into the Nco\ site of pBR328. 
The resulting constructs were confirmed by sequencing and designated as 
pDCQI 00 for the crtE clone and pDCQI 01 for the crU clone. 
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Approximately 1 pg DNAof pDCQlOO and pDCQIOI were introduced into 
Rhodococcus ATCC 47072 by electroporation and plated on NBYE plates 
with 10 pg/ml tetracycline. The pBR328 vector does not replicate in 
Rhodococcus. The tetracycline resistant transfomiants obtained after 
5 3-4 days of incubation at SO^'C were generated by chromosomal 

integration. Integration into the targeted crtE or crti gene on chromosome 
of ATCC 47072 was confimied by PGR. The vector specific primers PBR3 
(5'-AGCGGCATCAGCACCTTG-30 (SEQ ID N0:31) and PBR5 (5'- 
GCCAATATGGACAACTTCTTC-3') (SEQ ID NO:32). paired with the gene 

10 specific primers (outside of the insert on pDCQlOO or pDCQIOI) E_OP5 
(5'-ATCCGACCTCACTCGAACTGCCAG-30 (SEQ ID NO:33)and E_OP3 
(5'-GGTCGGCGAGCTGACGGTTCGAGT-3') (SEQ ID NO:34) or L0P5 
(5*-CGGCCACGAAGCGAAGCTACTGAC-3') (SEQ ID NO:35) and LOPS 
(5'-ATCGTGGATGAATGGTCGGTTACG-3') (SEQ ID NO:36), were used 

15 for PCR using chromosomal DMA prepared from the tetracycline resistant 
transformants as the templates. PCR fragments of the expected sizes 
were amplified from the tetracycline resistant transfomiants, but no PCR 
product was obteined firom the wild type ATCC 47072. When the two 
gene specific primers were used, no PCR fragment was obteined with the 

20 tetracycline reslstent transfbnnant due to the insertion of the lai^e vector 
DNA. The PCR fragmente obteined with the vector specific primers and 
the gene specific primers were sequenced. Sequence analysis of the 
junction of the vector and the crtE or crtI gene confinned that the single 
crossover recombination occurred at the expected sites and disrupted the 

25 target genes c/tE or crfA 

Next the phenotypes of the CrtE and CrtI disruption mutente of 
ATCC 47072 were analyzed. Colonies of CrtE or CrtI disruption mutente 
were pale white. It appeared that the pigmente present in the wild type 
strain were lost in both mutente. HPLC analysis of the carotenoids of the 
* 30 mutente confirmed the visual infection result HPLC analysis was 

perfomned as described in Example 4. The CrtI disruption mutent did not 
have the two HPLC peaks present in the wild type strgiins when monitored 
at 450 nm. It showed a HPLC peak at elution time of 1 5.8 min when 
monitored at 286 nm. The absorption maxima of this peak are 276 nm, 

35 286 nm, 297 nm, which is identical to that of phytoene. This peak was not 
present in the wild type strain. These resutte confinned the role of CrtI in 
carotenoids biosynthesis. Knockout of the CrtI resulted in no carotenoid 
pigment as represented by the two HPLC peaks at 450 nm. Phytoene 
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(colorless) accumulation in the CrtI mutant confirms the function of CrtI as 
the phytoene dehydogenase as suggested by the BLAST search. The 
CrtE mutant had neither the two HPLC peaks present in the wild type nor 
the phytoene peak in the CrtI mutant. These results also confimned the 
5 roleofCrtEincarotenoklsbk)synthesis. No phytoene accumulation in 
CrtE was consistent with the function of CrtE as geranylgeranyl 
pyrophosphate synthase, whteh acts prior to the phytoene synthesis step 
in the pathway. 

10 Tables. Summary of the phenotypes of the Crt knockout mutants of 
ATCC 47072 



Strain 


Colony color 


Caiotenoids analysis tyy HPLC (450 nm) 


Phytoene 
intermediate 


Wildfype 


Pink 


Major (46 5nm) at 14.6 min 

Minor (435nnfi. 458 nm. 488 nm) at 15.6 min 


No 


CrtI 


White 


No peaks 


Yes 


CrtE 


White 


No peaks 


No 



EXAMPLE 7 

15 Inhibition of the CrtL-tvoe of Lvcooene Cyclase in Rhodococcus 

Since the lycopene cyclase identified In Rhodococcus erythmpolis strain 
AN12 showed high sequence similarity to the CrtL-type of lycopene cyclases in 
plants and cyanobacterium (Example 3), it was decided to detenmine if the 
lycopene cyclase in RhockxxHXUS was also functionally related to the CrtL-type 

20 of lycopene cyclases. The tii-alkyi amine compounds, 2-(4HnetKylphenoxy)- 
triethylamine hydrochloride (MPTA) and 2-(4-chk>rophenylthlo)-triethylamine 
hydrochtoride (CPTA), have been shown to spedfkally inhibit the CrtL-type of 
lycopene cyclases and not the nonphotosynthetic bacterial CrtY-type of lycopene 
cyclases (Cunningham, Jr., et al. Molecular structure and enzvmatic function of 

25 lycopene cyclase firom the Cvanobacterium Synechococ cus so. strain PCC7942. 
777e Plant Cell, 1994, Vol.6:1 107). An examination was made of the effect of 
MPTA or CPTA on carotenokt production in Rhodococcus erythmpolis. One ml 
of oyemight cultured ATCC 47072 cells were added to 200 ml LB medium with 
0.05% Tween-80 without or with 40 pM CPTA or MPTA inhibitor, and cultured at 

30 30*C with shaking for 24 hr. Cells were spun down at 4000 g for 15 min, and the 
cell pellet was resuspended in 10 ml acetone. CarotenokJs were extracted into 
acetone with constant shaking at room temperature for 1 hr followed by spinning 
down the cell debris at 4000 g for 15 min. The extraction was repeated once, 
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and the supematants of both extractions were combined and dried under 
nitrogen. The dried material was re-dissolved in 1 ml lirathanol and insoluble 
material was removed by spinning at 16.000 g for 2 min in a microcentrifuge. 
0.1 ml of the sample was used for HPLC analysis as described in Example 4. 

5 Results are summarized in Table 4. 

In the absence of any inhibitor. Rhodococcus ATCC 47072 produced the 
same carotenoids as described in Example 4. In the presence of 40 pM CPTA or 
MPTA. the major peak appeared at 15.3 min with the absorption spectra as 443, 
469. 500 nm. The authentic lycopene standard from Sigma (SL Louis. MO) 

10 showed similar properties under the same conditions (eluted at 15.3 min with the 
peak spectra as 443. 469. 500 nm). These confinned that lycopene is the 
substirate of the cyclase in Rhodococcus and the f^odococcus lycopene cyclase 
could be Inhibited by tiie inhibitors specific for the CrtL-type of (^clases in 
photosyntiietic bacteria and plants. In the presence of 40 pM CPTA. the 

IS inhibition was estimated to be 95%. and small amount (5% of total carotenokJs) 
of ttie wild type major carotenokl was still observed. In tiie presence of 40 pM 
MPTA. the inhibition was estimated to be 82%, and 18% of the totel carotenoids 
was the wild type major carotenoid. 

20 Table 4. Inhibition of lycopene cyclase in Rhodococcus ATCC 47072. 



ATCC 47072 


Major peak 


Minor peak 


No inhibitor 


14.6 min (465nm) 
87% 


15.6min(437.459. 

486nm) 

13% 


40 pM CPTA 


15.3min(443.469. 

SOOnm) 

95% 


14.5min (465nrri) 
5% 


40 pM MPTA 


15.3min(443.469. 

500nm) 

82% 


14.5min (465nm) 
18% 
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CLAIMS 

What is claimed is: 

1 . An isolated nucleic add molecule selected from the group 
consisting of: 

5 (a) an Isolated nucleic acid molecule encoding an isoprenoid 

biosynthetic enzyme, having an amino add sequence 
selected from the group consteting of SEQ ID N0s:2, 4, 6. 
8,10, 12. 14. 16. 18 and 20; 
(b) an isolated nudelc acid molecule encoding an isoprenoid 

10 biosynthetic enzyme, that hybridizes with (a) under the 

following hybridization conditions: 0.1X SSC. 0.1% SDS, 
65'C and washed with 2X SSC. 0.1% SOS followed by 
0.1XSSC. 0.1% SDS; or 
an isolated nucleic add molecule that is complementary to (a). 

15 or(b). 

2. The isolated nudeic add molecule of Claim 1 selected from the 
group consisting of SEQ ID N0s:1. 3, 5. 7. 9, 11. 13. 15, 17 and 19. 

3. A polypeptide encoded by the isolated nudeic add molecule of 
Claim 1. 

20 4. The polypeptide of Claim 3 selected from ttie group consisting 

of SEQ ID NOs: 2, 4. 6. 8, 10, 12. 14. 16. 18 and 20. 

5. An isolated nucleic add molecule comprising a first nucleotide 
sequence encoding a polypeptide of at least 648 amino acids that has at 
least 70% identity based on the Smith-Watemnan method of alignment 

25 when compared to a polypeptide having the sequence as set forth In SEQ 
ID N0:2 or a second nucleotide sequence comprising the complement of 
the first nucleotide sequence. 

6. An isolated nudeic add molecule comprising a first nucleotide 
sequence encoding a polypeptide of at least 385 amino adds that has at 

• "" 30 least 71%.ideritity based on the Smith-Waterman method of alignment 

when compared to a polypeptide having the sequence as set forth in SEQ 
ID N0:4 or a second nucleotide sequence comprising the complement of 
the first nucleotide sequence. 

7. An isolated nudeic acid molecule comprising a first nucleotide 
35 sequence encoding a polypeptide of at least 232 amino adds that has at 

least 70% identity based on the Smith-Watemran method of alignment 
when compared to a polypeptide having the sequence as set forth in SEQ 
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ID N0:6 or a second nucleotide sequence comprising the complement of 
ttie first nucleotide sequence. 

8. An isolated nucleic add molecule comprising a first nucleotide 
sequence encoding a polypeptide of at least 31 1 amino acids that has at 

5 least 70% identity based on the Smith-Watennan method of alignment 
when compared to a polypeptide having the sequence as set forth In SEQ 
ID N0:8 or a second nucleotide sequence comprising the complement of 
the first nucleotide sequence. 

9. An isolated nudeic acid molecule comprising a first nudeotide 
10 sequence encoding a polypeptide of at least 1 58 amino adds that has at 

least 70% identity based on the Smith-Waterman method of alignment 
when compared to a polypeptide having the sequence as set forth in SEQ 
ID NO:10 or a second nudeotide sequence comprising the complement of 
the first nucleotide sequence. 

15 10. An isolated nudeic add molecule comprising a first nucleotide 

sequence encoding a polypeptide of at least 344 amino acids that has at 
least 70% identity based on the Smith-Watemnan method of alignment 
when compared to a polypeptide having the sequence as set forth in SEQ 
ID NO:12 or a second nudeotide sequence comprising the complement of 

20 ttie first nudeotide sequence. 

. 11. An isolated nucleic add molecule comprising a first nudeotide 
sequence encoding a polypeptide of at least 378 amino adds that has at 
least 70% identify based on the Sm*rth-Waterman method of alignment 
when compared to a polypeptide having the sequence as set forth in SEQ 

25 ID N0:14 or a second nucleotide sequence comprising the complement of 
ttie first nudeotide sequence. 

12. An isolated nudeic add molecule comprising a first nudeotide 
sequence encoding a polypeptide of at least 314 amino adds of that has 
at least 70% identity based on ttie Smith-Watennan mettiod of al^nment 

r"' 30 when compaced to a polypeptide having the sequence as set forth in SEQ 
ID N0:16 or a second nudeotide sequence comprising the complement of 
the first nucleotide sequence. 

13. An isolated nudeic acid molecule comprising a first nudeotide 
sequence encoding a polypeptide of at least 530 amino acids ttiat has at 

35 teast 70% identity based on ttie Smitti-Watennan method of alignment 
when compared to a polypeptide having ttie sequence as set forth in SEQ 
ID NO:18 or a second nudeotide sequence comprising the complement of 
ttie first nudeotide sequence. 
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14. An isolated nucleic acid molecule comprising a first nucleotide 
sequence encoding a polypeptide of at least 376 amino acids that has at 
least 70% identity based on the Smith-Watemian method of alignment 
when compared to a polypeptide having the sequence as set forth in SEQ 

S ID NO:20 or a second nucleotide sequence comprising the complement of 
the first nucleotide sequence. 

15. A chimeric gene comprising the isolated nucleic acid molecule 
of any one of Claims 1 or 5-14 operably linked to suitable regulatory 
sequences. 

10 16. A transformed host cell comprising the chimeric gene of 

Claim 15. 

17. The transformed host cell of Claim 16 wherein the host cell is 
selected from the group consisting of bacteria, yeast, filamentous fungi, 
algae, and green plants. 

IS 18. The transfonned host cell of Claim 17 wherein the host cell is 

selected from the group consisting of Aspergillus, Trichoderma, 
Saccharomyces, Pidiia, Candida, Hansenula, or bacterial species such as 
Salmonella, Badllus, Adnetoba(aer, Zymomonas, Agmbacterium, 
Erythrobacter, Chlombium, Chromatium, Flavobacterium, Cytophaga, 

20 Rhodobacter, Rhodococcus, Streptomyces, Breynbacterium, 

Corynebacteria, h^ycobacterium, Deinococcus, Escherichia, EnM'nia, 
Pantoea, Pseudomonas, Sphingomonas, Methylomonas, Methylobacter, 
Methylococcus, Methylosinus, Methylomicmbium, Methylocystis, 
Alcaligenes, Synechocystis, Synechococcus, Anabaena, Myxococcus, 

25 Thiobacillus, Methanobacterium and Klebsiella. 

19. The transfonned host cell of Claim 17 wherein the host cell is 
selected from the group consisbng of Spimlina. Haemotacoccus, and 
Dunalliela 

20. The transformed host cell of Claim 17 wherein the host cell Is 
30 selected fromihe group consisting of soybean, rapeseed. sunflower, 

cotton, com, tobacco, aHalfo, wheat, bariey. oats, sorghum, rice, 
Arabidopsis, cruciferous vegetables, melons, carrots, celery, parsley, 
tomatoes, potatoes, strawberries, peanuts, grapes, grass seed crops, 
sugar beets, sugar cane, beans, peas, rye, flax, hardwood trees, softwood 
35 trees, and forage grasses. 

21 . A method of obtaining a nucleic acid molecule encoding an 
isoprenoid compound biosynthetic enzyme comprising: 
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(a) probing a genomic library with the nucleic add molecule of 
any one of Claims 1 or 5-14; 

(b) identHying a DMA clone that hybridizes with the nucleic 
acid molecule of any one of Claims 1 or 5-14; and 

S (c) sequendng the genomic fragment that comprises the 

clone identified in step (b), 
wherein the sequenced genomic fragment encodes an isoprenoid 
biosynthetic enzyme. 

22. A method of obtaining a nucleic add molecule encoding an 
10 Isoprenoid biosynthetic enzyme comprising: 

(a) synthesizing an at least one oligonudeotide primer 

corresponding to a portion of the sequence selected from 
the group consisting of SEQ ID NOs:1. 3, 5, 7, 9. 11. 13. 
15, 17 and 19; and 

15 (b) amplifying an insert present in a cloning vector using the 

oligonudeotide primer of step (a); 
wherein the amplified insert encodes a portion of an amino add sequence 
encoding an isoprenoid biosynthetic enzyme. 

23. The product of ttie method of Claims 21 or 22. 

20 24. A metiKxi for the production of isoprenoid compounds 
comprising: contacting a tiansfbnned host cell under suitable growtti 
conditions witti an eftective amount of a femnentable carbon subst^te 
wherry an isoprenoid compound is produced, said transformed host cell 
comprising a set of nucleic add molecules encoding SEQ ID N0s:2, 4, 6. 

25 8, 10, 12, 14, 16, 18 and 20 under the control of suitable regulatory 
sequences. 

25. A method according to Claim 24 wherein the taransfomned host 
is selected from the group consisting of bacteria, yeast, filamentous fiingi. 
algae, and green plants. 

30 26_ _A mettiod according to Claim 25 wherein the transformed 

host cell Is selected torn \he group consisting Aspergillus, Trichoderma, 
Saccharomyces, Pichia, Candida. Hansenula, or bacterial species such as 
Salmonella, BacBlus. Aanetobatiter, Zymomonas, Agmbacterium, 
ByOvobaclter, Chlombium, Chmmatium. Flavobacterium, C)ft<^haga, 

35 Rhodobacter, Rhodococcus, Streptomyces, Brevibacterium, 

Corynebacteria, Mycobacterium. Deinococcus, Escherichia, Erwinia. 
Pantoea, Pseudomonas. Sphingomonas, Methylomonas, Methylobacter, 
Methylococcus, Methylosinus, Methylomicmbium, MethylocysOs. 
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Alcaligenes, Synechocystis, Synechococcus, Anabaena, Myxococcus, 
Thiobacillus, Methanobacterium and Klebsiella. 

27. A method according to Claim 25 wherein the transformed host 
cell is selected from the group consisting of Spinilina, Haemotacoccus, 

5 and Dunalliela. 

28. A method according to Claim 25 wherein the transfbnned host 
cell is selected from the group consisting of soybean, rapeseed, sunflower, 
cotton, com, tobacco, alfalfa, wheat, barley, oats, sorghum, rice, 
Arabidopsis, cruciferous vegetables, melons, canrots, celery, parsley, 

10 tomatoes, potatoes, strawberries, peanuts, grapes, grass seed crops, 
sugar beets, sugar cane, beans, peas, rye, flax, hardwood trees, softwood 
trees, and forage grasses. 

29. A method of regulating isoprenoid biosynthesis in an organism 
comprising, over-expressing at least one isoprenoid gene selected from 

15 the group consisting of SEQ ID NOs:1 , 3, 5, 7, 9, 1 1 , 13, 15, 17 and 19 in 
an organism such that the isoprenoid biosynthesis is altered in the 
organism. 

30. A method according to Claim 29 wherein said isoprenoid gene 
is over-expressed on a multicopy plasmid. 

20 31 . A method according to Claim 29 wherein said isoprenoid gene 

is operably linked to an inducible or regulated promoter. 

32. A method according to Claim 29 wherein said isoprenoid gene 
is expressed in antisense orientation. 

33. A method according to Claim 29 wherein said isoprenoid gene 
25 is disrupted by insertion of foreign DNA into the coding region. 

34. A mutated gene encoding a isoprenoid enzyme having an 
altered biological activity produced by a mettiod comprising the steps of: 

(i) digesting a mixture of nucleotide sequences with 
restriction endonudeases wherein said mixture comprises: 

30 _ a) a native isoprenoid gene; 

b) a first population of nucleotide fragments which will 
hybridize to said native isoprenoid gene; 

c) a second population of nucleotide fragments which will 
not hybridize to said native isoprenoid gene; 

35 wherein a mixture of restriction fragments is produced; 

(ii) denaturing said mixture of restriction fragments; 

(iii) incubating ttie denatured said mixture of restriction 
firagments of step (ii) witii a polymerase; 
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(iv) repeating steps (ii) and (lii) wherein a mutated isoprenoid 
gene is produced encoding a protein having an altered 
biological activity. 
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SEQUENCE LISTING 



<I10> E.I. du Pont de Nemours and Company 

<120> Genes Involved in Isoprenoid Compound Production 

<130> CL1788 PCT 

<150> 60/285,910 
<151> 2001-APR-24 

<160> 36 

<170> Microsoft Office 97 

<210> 1 
<211> 1947 
<212> DNA 

<213> Rhodococcus erythropolis 
<400> 1 

ttgggtgttc ttgcccgcat tcagggtcct gacgatctac gtcagttgag ccacgccgag 60 

atgacggagt tggccgacga gattcgtgag ttcctcgtgc tgaaggtcgc tgcgaccggt 120 

ggtcacctcg ggcccaactt gggcgtcgtg gagttgaccc tcgcactgca ccgaattttc 180 

gactcgccgc aggacgcgat catcttcgac acgggccatc aggcctacgt gcacaagatc 240 

ctcaccggtc gtcaggatca gttcgacact ctgcgtaagc agggcggact gtccgggtat 300 

ccgtgccgcg ccgagagcga acacgactgg gtcgagtcct ctcacgcttc cgccgcgttg 360 

tcctatgccg acggcctcgc gaaggccttc gcgctcacgg gccagaatcg ccacgttgtc 420 

gccgtcgtcg gtgacggcgc cctgaccggc ggaatgtgtt gggaagccct caacaacatc 480 

cfcagccggaa aagaccgttc ggtggtgatc gtcgtcaacg acaacggccg ctcgtacgcg 540 

ccgaccatcg gcggcctcgc cgaccatctt tcggcactgc gcaccgcgcc gagttacgag 600 

cgcgccctcg acagtggccg acgcatggtc aagagactgc cctgggtggg gcgcaccgcg 660 

tactccgtcc tgcacggaat gaaggcgggt ctcaaggacg ctgtcagccc tcaggtcatg 720 

ttcaccgatc tgggtatcaa gtacctcgga ccggtcgacg gtcacgacga agccgccatg 780 

gaatcggcgt tgcgccgggc gaaggcctac ggcggaccgg tcatcgttca tgccgtcact 840 

cgtaagggca acggttacgc acacgccgag aacgacgtgg ccgaccagat gcatgccacc 900 

ggcgtcatcg atcccgtcac cggtcgcggc accaagtcgt ccgcgccgga ctggacgtcg 960 

gtcttctcgg ccgcattgat cgagcaggct tcgcgtcgtg aggacattgt cgccatcacc 1020 

gcggcgatgg ccgggcccac cggcctcgcg gccttcgggg agaagttccc cgatcggatt 1080 

ttcgacgtcg gtatcgccga gcagcatgcg atgacctcgg ccgccggtct tgcacttggc 1140 

ggacttcacc ccgtcgttgc tatctactcg accttcctca atcgggcttt cgaccagttg 1200 

ttgatggacg tcgcactgct caaacaaccg gtgacagtcg tgctcgaccg cgccggggtc 1260 

accggagtcg acggcgccag ccacaacggc gtctgggatc tttcgctgct cggaatcatc 1320 

ccggggattc gcgtcgcggc accgcgtgat gcagacacac tgcgggaaga gttggacgag 1380 

gcgcttctcg tcgacgacgg cccaacggtc gtacggttcc cgaagggtgc tgtacccgaa 1440 

- -gcgattccgg cagtgaagcg actcgacgga atggtcgacg tcctcaaggc cagcgagggt 1500 

gagcgcggcg acgtgcbcct cgtcgcggtg ggcccatttg catccttggc gctcgagatt 1560 

gccgagcggc tcgacaagca gggcatctcg gttgccgtcg ttgatccgcg atgggttctg 1620 

ccggtcgcgg attcgctggt gaagatggcg gacaagtacg ccctcgtggt caccatcgaa 1680 

gacggcggtt tgcacggcgg catcggttcg acggtctcgg ccgcgatgcg tgccgccgga 1740 

gtgcacacgt cgtgccgcga catgggcgtt ccccagcagt tcctcgatca cgccagccgc 1800 

gaagccatcc acaaggaact cggactcacg gctcaggacc tctcccgcaa gatcaccggc 1860 

tgggtggcgg ggatgggcag cgtcggcgtc cacgtccagg aagacgcgtc ctcggcttcg 1920 

gctcagggcg aagtcgcgca aggctga 1947 

<210> 2 
<211> 648 
<212> PRT 

<213> Rhodococcus erythropolis 
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<400> 2 

Met Gly Val Leu Ala Arg lie Gin Gly Pro Asp Asp Leu Arg Gin Leu 
15 10 15 

Ser His Ala Glu Met Thr Glu Leu Ala Asp Glu lie Arg Glu Phe Leu 
20 25 30 

Val Leu Lys Val Ala Ala Thr Gly Gly His Leu Gly Pro Asn Leu Gly 
35 40 45 

Val Val Glu Leu Thr Leu Ala Leu His Arg lie Phe Asp Ser Pro Gin 
50 55 60 

Asp Ala He He Phe Asp Thr Gly His Gin Ala Tyr Val His Lys He 
65 70 75 80 

Leu Thr Gly Arg Gin Asp Gin Phe Asp Thr Leu Arg Lys Gin Gly Gly 
85 90 95 

Leu Ser Gly Tyr Pro Cys Arg Ala Glu Ser Glu His Asp Trp Val Glu 
100 105 110 

Ser Ser His Ala Ser Ala Ala Leu Ser Tyr Ala Asp Gly Leu Ala Lys 
115 120 125 

Ala Phe Ala Leu Thr Gly Gin Asn Arg His Val Val Ala Val Val Gly 
130 135 140 

Asp Gly Ala Leu Thr Gly Gly Met Cys Trp Glu Ala Leu Asn Asn He 
145 150 155 160 

Ala Ala Gly Lys Asp Arg Ser Val Val He Val Val Asn Asp Asn Gly 
165 170 175 

Arg Ser Tyr Ala Pro Thr He Gly Gly Leu Ala Asp His Leu Ser Ala 
180 185 190 

Leu Arg Thr Ala Pro Ser Tyr Glu Arg Ala Leu Asp Ser Gly Arg Arg 
195 - 200 205 

Met Val Lys Arg Leu Pro Trp Val Gly Arg Thr Ala Tyr Ser Val Leu 
210 215 220 

His Gly Met Lys Ala Gly Leu Lys Asp Ala Val Ser Pro Gin Val Met 
225 230 235 240 

Phe Thr Asp £eiU G±y He Lys Tyr Leu Gly Pro Val Asp Gly His Asp 
245 250 255 

Glu Ala Ala Met Glu Ser Ala Leu Arg Arg Ala Lys Ala Tyr Gly Gly 
260 265 270 

Pro Val He Val His Ala Val Thr Arg Lys Gly Asn Gly Tyr Ala His 
275 280 285 

Ala Glu Asn Asp Val Ala Asp Gin Met His Ala Thr Gly Val He Asp 
290 295 300 
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Pro Val Thr Gly Arg Gly Thr Lys Set Ser Ala Pro Asp Trp Thr Ser 
305 310 315 320 

Val Phe Ser Ala Ala Leu lie Glu Gin Ala Ser Arg Arg Glu Asp lie 
325 330 335 

Val Ala He Thr Ala Ala Het Ala Gly Pro Thr Gly Leu Ala TQa Phe 
340 345 350 

Gly Glu Lys Phe Pro Asp Arg lie Phe Asp Val Gly He Ala Glu Gin 
355 360 365 

His Ala Met Thr Ser Ala Ala Gly Leu Ala Leu Gly Gly Leu His Pro 
370 375 380 

Val Val Ala He Tyr Ser Thr Phe Leu Asn Arg Ala Phe Asp Gin Leu 
385 390 395 400 

Leu Met Asp Val Ala Leu Leu Lys Gin Pro Val Thr Val Val Leu Asp 
405 410 415 

Arg Ala Gly Val Thr Gly Val Asp Gly Ala Ser His Asn Gly Val Trp 
420 425 430 

Asp Leu Ser Leu Leu Gly He He Pro Gly He Arg Val Ala Ala Pro 
435 440 445 

Arg Asp Ala Asp Thr Leu Arg Glu Glu Leu Asp Glu Ala Leu Leu Val 
450 455 460 

Asp Asp Gly Pro Thr Val Val Arg Phe Pro Lys Gly Ala Val Pro Glu 
465 470 475 480 

Ala He Pro Ala Val Lys Arg Leu Asp Gly Met Val Asp Val Leu Lys 
485 490 495 

Ala Ser Glu Gly Glu Arg Gly Asp Val Leu Leu Val Ala Val Gly Pro 
500 505 510 

Phe Ala Ser Leu Ala Leu Glu He Ala Glu Arg Leu Asp Lys Gin Giy 
515 520 525 

He Ser Val Ala Val Val Asp Pro Arg Trp Val Leu Pro Val Ala Asp 
530 535 540 

Ser Leu Val Lys Met Ala Asp Lys Tyr Ala Leu Val Val Thr He Glu 
. . '-^ 545 550 555 560 

Asp Gly Gly Leu His Gly Gly He Gly Ser Thr Val Ser Ala Ala Met 
565 570 575 

Arg Ala Ala Gly Val His Thr Ser Cys Arg Asp Met Gly Val Pro Gin 
580 585 590 

Gin Phe Leu Asp His Ala Ser Arg Glu Ala He His Lys Glu Leu Gly 
595 600 605 

Leu Thr Ala Gin Asp Leu Ser Arg Lys He Thr Gly Trp Val Ala Gly 
610 615 620 
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Met Gly Ser Val Gly Val His Val Gin Glu Asp Ala Ser Ser Ala Ser 
625 630 635 640 

Ala Gin Gly Glu Val Ala Gin Gly 
645 

<210> 3 
<211> 1158 
<212> DHA 

<213> Rhodococcus erythropolis 
<400> 3 

gtgcaggaaa ccacacgtac ccgcgtcctc ctcctcggca gtaccggttc gatcggtacc 60 

caagcgctgg aggtcatcgc agccaacccc gatcgtttcg aagtagtcgg tctcgcagcg 120 

ggcggcaaca acgtcgagtt gttgggcgaa cagattcgtg caaccggcgt cacggacgtc 180 

gccgtcgccg atcctgcagc ggcatcggcg ctggaatcgg taaccgcccg ttcgggaccg 240 

agcgccgtga cggaactggt tcgggacagc ggtgccgatg ttgtcctcaa tgcactcgtc 300 

ggttcgttgg gactcgaacc gactctggcg gcgctgaact cgggagcgcg cctggcgctg 360 

gcgaacaagg aatcgcttgt cgccggcgga gcgctggtga ccaaagccgc cgcacccggt 420 

cagatcgtgc cggtcgactc ggagcattcg gcgcttgccc agtgtctacg tggtggaaca 480 

ggcgacgaag tggctcggtt ggttctcacc gcttcgggtg gaccgttccg tggctggagc 540 

gccgaggatc tcgaaagtgt gaatccagct caggcaaaag cgcaccccac ctggtcgatg 600 

gggcccatga acaccctcaa ttcggcaact ctggtcaaca agggcctcga gctgatcgag 660 

acgaacctgc tgttcgggat cgactacgac cgcatcgacg tcaccgtgca cccgcagtcg 720 

atcgtgcatt ccatggtgac cttcttcgac gggtcgacgc tggcacaggc aagcccgccg 780 

gacatgaagc tcccgatcgc tctcgctctc ggctggccgg accgcatcga aggtgctgcg 840 

tcggcatgcg acttcaccac cgcctccacc tgggaattcg agccgctcga ttcgtcggtg 900 

ttccccgccg tcgatctggc gcgaagcgcg ggcaaatccg gcggttgctt caccgcgatc 960 

tacaacgcgg ccaacgaagt ggcggctcag gcattcctcg acggtgtcat ttccttcccg 1020 

gcgatcgtcc gcacggtggc cgctgttctc gacgatgcag gtcaatggtc cgcggaaccg 1080 

gttaccgtgg acgacgttct ggccgcagac ggctgggcac gcacacgagc gcgtcagctc 1140 

gtgaagcagg agggctag 1158 

<210> 4 
<211> 385 
<212> PRT 

<213> Rhodococcus erythropolis 

<400> 4 ^ 

Met Gin Glu ThW Thr Arg Thr Arg Val Leu Leu Leu Gly Ser Thr Gly' 

15 10 15 

Ser lie Gly Thr Gin Ala Leu Glu Val He Ala Ala Asn Pro Asp Arg 
20 25 30 

Phe Glu Val Val Gly Leu Ala Ala Gly Gly Asn Asn Val Glu Leu Leu 
. 35 40 45 

Gly Glu Gin He Arg Ala Thr Gly Val Thr Asp Val Ala Val Ala Asp 
50 55 60 

Pro Ala Ala Ala Ser Ala Leu Glu Ser Val Thr Ala Arg Ser Gly Pro 
65 70 75 80 

Ser Ala Val Thr Glu Leu Val Arg Asp Ser Gly Ala Asp Val Val Leu 
85 90 95 

Asn Ala Leu Val Gly Ser Leu Gly Leu Glu Pro Thr Leu Ala Ala Leu 
100 105 110 
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Asn Ser Gly Ala Arg Leu Ala Leu Ala Asn Lys Glu Ser Leu Val Ala 
115 120 125 

Gly Gly Ala Leu Val Thr Lys Ala Ala Ala Pro Gly Gin He Val Pro 
130 135 140 

Val Asp Ser Glu His Ser Ala Leu Ala Gin Cys Leu Arg Gly Gly Thr 
145 150 155 160 

Gly Asp Glu Val Ala Arg Leu Val Leu Thr Ala Ser Gly Gly Pro Phe 
165 170 175 

Arg Gly Trp Ser Ala Glu Asp Leu Glu Ser Val Asn Pro Ala Gin Ala 
180 185 190 

Lys Ala His Pro Thr Trp Ser Met Gly Pro Met Asn Thr Leu Asn Ser 
195 200 205 

Ala Thr Leu Val Asn Lys Gly Leu Glu Leu He Glu Thr Asn Leu Leu 
210 215 220 

Phe Gly He Asp Tyr Asp Arg He Asp Val Thr Val His Pro Gin Ser 
225 230 235 240 

He Val His Ser Met Val Thr Phe Phe Asp Gly Ser Thr Leu Ala Gin 
245 250 255 

Ala Ser Pro Pro Asp Met Lys Leu Pro He Ala Leu Ala Leu Gly Trp 
260 265 270 

Pro Asp Arg He Glu Gly Ala Ala Ser Ala Cys Asp Phe Thr Thr Ala 
275 280 285 

Ser Thr Trp Glu Phe Glu Pro Leu Asp Ser Ser Val Phe Pro Ala Val 
290 295 300 

Asp Leu Ala Arg Ser Ala Gly Lys Ser Gly Gly Cys Phe Thr Ala He 
305 310 315 320 

Tyr Asn Ala Ala Asn Glu Val Ala Ala Gin Ala Phe Leu Asp Gly Val 

325 330 335 

He Ser Phe Pro Ala He Val Arg Thr Val Ala Ala Val Leu Asp Asp 
340 345 350 

Ala Gly Gin Trp Ser Ala Glu Pro Val Thr Val Asp Asp Val Leu Ala 
355 360 365 

Ala Asp Gly Trp Ala Arg Thr Arg Ala Arg Gin Leu Val Lys Gin Glu 
370 375 380 

Gly 
385 

<210> 5 
<211> 699 
<212> DNA 

<213> Ettiodococcus erythropolis 



5 



wo 02/086094 PCT/US02/15033 
<400> 5 

gtggcagtag tagccctggt acctgccgca ggtcggggag tgcgattggg cgagaaattg bO 

cccaaggcat ttgtcgaact cggtgggtgc accatgcttg cacgcgcggt cgatggactc 120 

cggaaatccg gagcgatcga ccgcgttgtt gtcattgtgc cgcctgaact ggtcgaatcc 180 

gtcgtggccg acctcggtcg tgcatcggac gtcgacgtcg tcggtggtgg tgccgaaaga 240 

accgattcgg ttcgagccgg tctcagtgct gccggcgacg cagattttgt actcgtgcac 300 

gacgccgcgc gggcattgac gccgccggcg ttgatcgcgc gcgtcgtcga cgctctccga 360 

gccggcagca gcgctgtcat cccggtactc ccggttaccg acacgatcaa gtcggtcgac 420 

gtactcggcg cagtcaccgg aacgcctctg cgttcggagt tgcgtgcggt tcaaactcct 480 

caaggcttct ccaccgacgt cctgcgcagt gcgtacgacg ccggtgatgt cgccgcgacc 540 

gacgacgccg ctctggtgga gcgtctcggt gtttcggtgc agacgattcc cggcgacgct 600 

ctcgccttca agatcaccac tccgctcgac ctcgtccttg cacgggcgct cctgatctcg 660 

gagacagagt tgagcgcgga ctcacaggac ggaaaatag 699 

<210> 6 
<211> 232 
<212> PRT 

<213> Rhodococcus erythropolis 
5:400> 6 

Met Ala Val Val Ala Leu Val Pro Ala Ala Gly Arg Gly Val Arg Leu 
15 10 15 

Gly Glu Lys Leu Pro Lys Ala Phe Val Glu Leu Gly Gly Cys Thr Met 
20 25 30 

Leu Ala Arg Ala Val Asp Gly Leu Arg Lys Ser Gly Ala He Asp Arg 
35 40 45 

Val Val Val He Val Pro Pro Glu Leu Val Glu Ser Val Val Ala Asp 
50 55 60 

Leu Gly Arg Ala Ser Asp Val Asp Val Val Gly Gly Gly Ala Glu Arg 
65 70 75 80 

Thr Asp Ser Val Arg Ala Gly Leu Ser Ala Ala Gly Asp Ala Asp Phe 
85 90 95 

Val Leu Val His Asp Ala Ala Arg Ala Leu Thr Pro Pro Ala Leu He 
100 105 110 

Ala Arg Val Val Asp Ala Leu Arg Ala Gly Ser Ser Ala Val He Pro 
115 120 125 

Val Leu Pro Val Thr Asp Thr He Lys Ser Val Asp Val Leu Gly Ala 
130 135 140 

Val Thr Gly ThY Pro Leu Arg Ser Glu Leu Arg Ala Val Gin Thr Pro 
145 150 .155 160 

Gin Gly Phe Ser Thr Asp Val Leu Arg Ser Ala Tyr Asp Ala Gly Asp 
165 170 175 

Val Ala Ala Thr Asp Asp Ala Ala Leu Val Glu Arg Leu Gly Val Ser 
180 185 190 

Val Gin Thr He Pro Gly Asp Ala Leu Ala Phe Lys He Thr Thr Pro 
195 200 205 
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Leu Asp Leu Val Leu Ala Arg Ala Leu Leu He Ser Glu Thr Glu Leu 
210 215 220 

Ser Ala Asp Ser Gin Asp Gly Lys 
225 230 

<210> 7 
<211> 936 
<212> DHA 

<213> Rhodococcus erythropolls 
<400> 7 

gtgctctccg tcgttcctcg ccccgtagtt gtccgggccc cgtccaaggt gaatctccac 

cttgccgtcg gggacctgcg agacgacggc tatcacgaac tgacgaccgt ttttcaggca 120 

ttgtcgctgg cagacactgt cacggtggcg cctgcggaca ccttgaccgt gcgggtgatc 180 

ggcgacgacg ccgcggccgt accgaccgat cgcaccaatc tcgtgtggcg tgccgccgag 240 

atgcttgcgg ccgagggtgg cgtggccccg aatgtcgaga tcgtcatcga gaagggcatt 300 

cccgtcgcag gcggtatggc cggcgggagc gccgacgcgg cagccgcgtt ggttgcgctc 360 

aattcgttgt ggaaactcga cttctcgcgg cctgatctcg acgccttcgc ggcacgtctc 420 

gggagtgacg ttccgttctc gctgcacggt ggcactgccc tcgggaccgg tcgcggtgaa 480 

caacttgtcc ccgtcttgac gcgccgcacc tttcactggg tgttggcgct ggccaaggga 540 

ggcttgagca cgccggttgt cttccgggaa ctcgacaagc ttcgcgccga aggcacaccg 600 

aatcgattgg gtaccgctga cgagttgatt cacgcgctca ccaccggtga ccctcatgtg 660 

ctcgccccgc tgctcggaaa cgatctgcag gcggcagcac tctcactcaa cccggatcta 720 

cgacggacgc tgcgagcggg tgtcgaagcc ggagctttgg ccggcatcgt ctccggctcc 780 

ggaccgacgt gcgcctttct ctgcgccgac gcacagtccg cggtggaagt gagcgcagaa 840 

cttgcgggag cgggggtgtg ccgcaccgtt cgcgtggcga gcggacccgt tcccggagca 900 

cgaatactcg acaatgcggc aaagggacag cactga 936 

<210> 8 
<211> 311 
<212> PRT 

<213> Rhodococcus erythropolls 
<400> 8 

Met Leu Ser Val Val Pro Arg Pro Val Val Val Arg Ala Pro Ser Lys 
15 10 15 

Val Asn Leu His Leu Ala Val Gly Asp Leu Arg Asp Asp Gly Tyr His 
20 25 30 

Glu Leu Thr Thr Val Phe Gin Ala Leu Ser Leu Ala Asp Thr Val Thr 
35 40 45 

Val Ala Pro Ala Asp Thr Leu Thr Val Arg Val He Gly Asp Asp Ala 
50 55 60 

Ala Ala Val Pro Thr Asp Arg Thr Asn Leu Val Trp Arg Ala Ala Glu 
65 70 75 80 

Met Leu Ala Ala Glu Gly Gly Val Ala Pro Asn Val Glu He Val He 
85 90 95 

Glu Lys Gly He Pro Val Ala Gly Gly Met Ala Gly Gly Ser Ala Asp 
100 105 110 

Ala Ala Ala Ala Leu Val Ala Leu Asn Ser Leu Trp Lys Leu Asp Phe 
115 120 125 
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Ser Arg Pro Asp Leu Asp Ala Phe Ala Ala Arg Leu Gly Ser Asp Val 
130 135 140 

Pro Phe Ser Leu His Gly Gly Thr Ala Leu Gly Thr Gly Arg Gly Glu 
145 150 155 160 

Gin Leu Val Pro Val Leu Thr Arg Arg Thr Phe His Trp Val Leu Ala 
165 170 175 

Leu Ala Lys Gly Gly Leu Ser Thr Pro Val Val Phe Arg Glu Leu Asp 
180 185 190 

Lys Leu Arg Ala Glu Gly Thr Pro Asn Arg Leu Gly Thr Ala Asp Glu 
195 200 205 

Leu He His Ala Leu Thr Thr Gly Asp Pro His Val Leu Ala Pro Leu 
210 215 220 

Leu Gly Asn Asp Leu Gin Ala Ala Ala Leu Ser Leu Asn Pro Asp Leu 
225 230 235 240 

Arg Arg Thr Leu Arg Ala Gly Val Glu Ala Gly Ala Leu Ala Gly He 
245 250 255 

Val Ser Gly Ser Gly Pro Thr Cys Ala Phe Leu Cys Ala Asp Ala Gin 
260 265 270 

Ser Ala Val Glu Val Ser Ala Glu Leu Ala Gly Ala Gly Val Cys Arg 
275 280 285 

Thr Val Arg Val Ala Ser Gly Pro Val Pro Gly Ala Arg He Leu Asp 
290 295 300 

Asn Ala Ala Lys Gly Gin His 
305 310 

<210> 9 
<211> 477 
<212> DNA 

<213> Rhodococcus erythropolis 
<400> 9 

atgcgcgtcg gtctcggcac ggatgttcat cccatcgagg tcggccgacc ttgctggatg 60 
gccgggttgc tgttcgagga agcagacggg tgctcggggc attcggacgg cgacgtcgcc 120 
gtccacgcgc tctgtgacgc gttgctctcc gccgcaggtc ttggcgacct cggttcggtt 180 
ttcggcaccg gcaggcccga atgggacggc gtgagcggcg ctcgaatgct tgccgaggtt 240 
cgtcgactgc tcgaagagaa ccagttcacc gtcggcaacg ccgcggtgca ggtcatcggc 300 
aaccgaccga agatcgggcc gcgacgcgac gaggcgcaga aggtgctctc ggacattctc 360 
ggcgcgcctg tttcggtgtc cgcgaccacc acggacgggc tcggcttgac cggtcgcggc 420 
gaggggatcg ccgccatggc caccgcgttg gtcatgacaa ccgaacacga caggtaa 477 

<210> 10 

<211> 158 

<212> PRT 

<213> Rhodococcus erythropolis 

<400> 10 

Met Arg Val Gly Leu Gly Thr Asp Val His Pro He Glu Val Gly Arg 
1 5 10 15 
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Pro Cys Trp Met Ala Gly Leu Leu Phe Glu Glu Ala Asp Gly Cys Ser 
20 25 30 

Gly His Ser Asp Gly Asp Val Ala Val His Ala Leu Cys Asp Ala Leu 
35 40 45 

Leu Ser Ala Ala Gly Leu Gly Asp Leu Gly Ser Val Phe Gly Thr Gly 
50 55 60 

Arg Pro Glu Trp Asp Gly Val Ser Gly Ala Arg Met Leu Ala Glu Val 
65 70 75 80 

Arg Arg Leu Leu Glu Glu Asn Gin Phe Thr Val Gly Asn Ala Ala Val 
85 90 95 

Gin Val He Gly Asn Arg Pro Lys He Gly Pro Arg Arg Asp Glu Ala 
100 105 110 

Gin Lys Val Leu Ser Asp He Leu Gly Ala Pro Val Ser Val Ser Ala 
115 120 125 

Thr Thr Thr Asp Gly Leu Gly Leu Thr Gly Arg Gly Glu Gly He Ala 
130 135 140 

Ala Met Ala Thr Ala Leu Val Met Thr Thr Glu His Asp Arg 
145 150 155 

<210> 11 
<211> 1035 
<212> DNA 

<213> Rhodococcus erythropolis 
<400> 11 

gtgagcaccg aaaagactgc tgccgacgca accgcatcga gcaccgtcgt tgcaggcatc 60 

gacctgggcg acgaacagct cgccgcagta gtgcgtggtg gactctccga tgtcgaggag 120 

ttgttggtca gcgagctgtc cgacggcgaa gacttcctca ccgaggccgc gctgcatctc 180 

gcgcgagccg ggggaaagcg cttccgtccg ttgttcacga tcctgaccgc gcaactcgga 240 

ccggtgccga acgatccgtc gatcatcacc gcagcgaccg tcaccgaact cgttcacctg 300 

gcgacgctct atcacgacga cgtcatggac gaggcctcca tgcggcgcgg agcacccagc 360 

gccaacgccc gctggggaaa cagcgtggcg atcctggccg gcgactatct gttcgcgcac 420 

gcatcacgcc tggtatcgac gctcggaccc gaagctgttc ggatcatcgc cgaaaccttt 480 

gcagagctgg tcaccggcca gatgcgcgag acgatcggcg tcaagaagga acaggatccg 540 

gtcgagcatt acctcaaggt cgtgtgggag aagaccggtt cgctcatcgc tgcatccgga 600 

cgattcggcg gcactttctc cggcgccgac gcagctcaca tcgagcgcct cgagcgcctg 660 

ggtgacgccg tcggcaccgc attccagatc tccgacgaca tcatcgacat ctcctccgta 720 

tcggcgcagt ccggcaagac tccgggcacc gacctgcgcg agggtgtcca caccctgccc 780 

gtcctgtacg cgttccgcga agaaggagcc gacgcagatc gcctgcggga gctgctcgcg 840 

ggcccggtca ccga'agsrcgc actggtagaa gaagctctcg aactgctcga gcgttcgccg 900 

ggcatggtca aggcgaaggc aaagctgggc gagtacgcag tctcggcaaa ggcccagttg 960 

gccgagctcc cgcagggacc ggcgaatgaa gcgctcgtgc gcctcgtgga ctacacgatc 1020 

gaacgagtcg gctga 1035 

<210> 12 
<211> 344 
<212> PRT 

<213> Rhodococcus erythropolis 
<400> 12 

Met Ser Thr Glu Lys Thr Ala Ala Asp Ala Thr Ala Ser Ser Thr Val 
15 10 15 
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Val Ala Gly He Asp Leu Gly Asp Glu Gin Leu Ala Ala Vail Val Arg 
20 25 30 

Gly Gly Leu Ser Asp Val Glu Glu Leu Leu Val Ser Glu Leu Ser Asp 
35 40 45 

Gly Glu Asp Phe Leu Thr Glu Ala Ala Leu His Leu Ala Arg Ala Gly 
50 55 60 

Gly Lys Arg Phe Arg Pro Leu Phe Thr He Leu Thr Ala Gin Leu Gly 
65 70 75 80 

Pro Val Pro Asn Asp Pro Ser He He Thr Ala Ala Thr Val Thr Glu 
85 90 95 

Leu Val His Leu Ala Thr Leu Tyr His Asp Asp Val Met Asp Glu Ala 
100 105 110 

Ser Met Arg Arg Gly Ala Pro Ser Ala Asn Ala Arg Trp Gly Asn Ser 
115 120 125 

Val Ala He Leu Ala Gly Asp Tyr Leu Phe Ala His Ala Ser Arg Leu 
130 135 140 

Val Ser Thr Leu Gly Pro Glu Ala Val Arg He He Ala Glu Thr Phe 
145 150 155 160 

Ala Glu Leu Val Thr Gly Gin Met Arg Glu Thr He Gly Val Lys Lys 
165 170 175 

Glu Gin Asp Pro Val Glu His Tyr Leu Lys Val Val Trp Glu Lys Thr 
180 185 190 

Gly Ser Leu He Ala Ala Ser Gly Arg Phe Gly Gly Thr Phe Ser Gly 
195 200 205 

Ala Asp Ala Ala His He Glu Arg Leu Glu Arg Leu Gly Asp Ala Val 
210 215 220 

Gly Thr Ala Phe Gin He Ser Asp Asp He He Asp He Ser Ser Val 
225 230 235 240 

Ser Ala Gin Ser Gly Lys Thr Pro Gly Thr Asp Leu Arg Glu Gly Val 
245 250 255 

His Thr Leu Pro Val Leu Tyr Ala Phe Arg Glu Glu Gly Ala Asp Ala 
26(y ~ 265 270 

Asp Arg Leu Arg Glu Leu Leu Ala Gly Pro Val Thr Glu Asp Ala Leu 
275 280 285 

Val Glu Glu Ala Leu Glu Leu Leu Glu Arg Ser Pro Gly Met Val Lys 
290 295 300 

Ala Lys Ala Lys Leu Gly Glu Tyr Ala Val Ser Ala Lys Ala Gin Leu 
305 310 315 320 

Ala Glu Leu Pro Gin Gly Pro Ala Asn Glu Ala Leu Val Arg Leu Val 
325 330 335 
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Asp Tyr Thr lie Glu Arg Val Gly 
340 

<210> 13 
<211> 1140 
<212> DMA 

<213> Rhodococcus erythropolis 
<400> 13 

ttggaggcca ccctgtccgc aggaaccgcg cgcgttggac agagttcgac caacaccgca 60 

ccgcatccga cctcactcga actgccaggc gtgttcgaag gagcgctccg cgacttcttc 120 

gattcacgcc gcgaactcgt ctcgaacatc ggcggtggat acgagaaagc cgtcagcacc 180 

ctcgaagcct tcgtcctgcg cggaggaaag cgcgtccggc cgtcgttcgc ctggacggga 240 

tggctcggcg ccggaggcga cccgaacggg agcggcgcgg acgcggtgat tcgtgcatgc 300 

gcggccctcg aactggtgca ggcctgcgcg ctcgtccacg acgacatcat cgacgcatca 360 

acgaccaggc gcggcttccc gaccgttcac gtcgaattcg aggaccagca ccgaggcgag 420 

gagtggagcg gcgactccgc gcacttcggc gaggccgtcg ccattctcct cggcgacctg 480 

gccttggcct gggctgacga catgatccga gaatccggga tcagccccga cgcggccgca 540 

cgagtgagcc cggtctggtc ggcaatgcgc accgaggtgc ttggtggcca attcctcgac 600 

atcagcaacg aagcccgcgg agacgagacc gtcgaggcag ccatgcgggt caaccgttac 660 

aaaaccgccg cgtacacgat cgaacgccca ctgcacctcg gcgccgcatt gttcggtgca 720 

gacgccgagt tgatcgatgc ctaccggacg ttcggcaccg acatcgggat tgccttccaa 780 

cttcgcgacg acctgctcgg tgtcttcgga gatccgtccg tcacgggcaa accgtcgggc 840 

gacgatctca tcgccggtaa gcggactgtc ctgttcgcga tggcgcttgc ccgcgccgac 900 

gccgcagatc cggcggcagc agaactgctc cgcaacggaa tcggcaccca gttgaccgac 960 

aacgaagtcg acactctgcg tcaggtgatc accgatcttg gcgccgtcac cgacgtcgaa 1020 

acgcagatcg acaccctcgt cgaggcagct gcgaacgccc tcgactcgag cacggcaacg 1080 

gcagagtcca aggctcgcct gaccgatatg gcgatcgcgg ccacgaagcg aagctactga 1140 

<210> 14 
<211> 378 
<212> PRT 

<213> lihodococcus erythropolis 
<400> 14 

Met Glu Ala Thr Leu Ser Ala Gly Thr Ala Arg Val Gly Gin Ser Ser 
15 10 15 

Thr Asn Thr Ala Pro His Pro Thr Ser Leu Glu Leu Pro Gly Val Phe 
20 25 30 

Glu Gly Ala Leu Arg Asp Phe Phe Asp Ser Arg Arg Glu Leu Val Ser 
35 40 45 

Asn He Gly Gly Gly Tyr Glu Lys Ala Val Ser Thr Leu Glu Ala Phe 
50 55 60 

Leu Arg Gly Gly Lys Arg Val Arg Pro Ser Phe Ala Trp Thr Gly Trp 
65 70 75 80 

Leu Gly Ala Gly Gly Asp Pro Asn Gly Ser Gly Ala Asp Ala Val He 
85 90 95 

Arg Ala Cys Ala Ala Leu Glu Leu Val Gin Ala Cys Ala Leu Val His 
100 105 110 

Asp Asp He lie Asp Ala Ser Thr Thr Arg Arg Gly Phe Pro Thr Val 
115 120 125 
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His Val Glu Phe Glu Asp Gin His Arg Gly Glu Glu Trp Ser Gly Asp 
130 135 140 

Ser Ala His Phe Gly Glu Ala Val Ala He Leu Leu Gly Asp Leu Ala 
145 150 155 160 

Leu Ala Trp Ala Asp Asp Met He Arg Glu Ser Gly He Ser Pro Asp 
165 170 175 

Ala Ala Ala Arg Val Ser Pro Val Trp Ser Ala Met Arg Thr Glu Val 
180 185 190 

Leu Gly Gly Gin Phe Leu Asp He Ser Asn Glu Ala Arg Gly Asp Glu 
195 200 205 

Thr Val Glu Ala Ala Met Arg Val Asn Arg Tyr Lys Thr Ala Ala Tyr 
210 215 220 

Thr He Glu Arg Pro Leu His Leu Gly Ala Ala Leu Phe Gly Ala Asp 
225 230 235 240 

Ala Glu Leu He Asp Ala Tyr Arg Thr Phe Gly Thr Asp He Gly He 
245 250 255 

Ala Phe Gin Leu Arg Asp Asp Leu Leu Gly Val Phe Gly Asp Pro Ser 
260 265 270 

Val Thr Gly Lys Pro Ser Gly Asp Asp Leu He Ala Gly Lys Arg Thr 
275 280 285 

Val Leu Phe Ala Met Ala Leu Ala Arg Ala Asp Ala Ala Asp Pro Ala 
290 295 300 

Ala Ala Glu Leu Leu Arg Asn Gly He Gly Thr Gin Leu Thr Asp Asn 
305 310 315 320 

Glu Val Asp Thr Leu Arg Gin Val He Thr Asp Leu Gly Ala Val Thr 
325 330 335 

Asp Val Glu Thr Gin He Asp Thr Leu Val Glu Ala Ala Ala Asn Ala 
340 345 350 

Leu Asp Ser Ser Thr Ala Thr Ala Glu Ser Lys Ala Arg Leu Thr Asp 
355 360 365 

Met Ala He Ala Ala Thr Lys Arg Ser Tyr 
370 375 

<210> 15 
<211> 945 
<212> DNA 

<213> Rhodococcus erythropolis 
<400> 15 

atgaacgcat tgtctgcgtc ctatgaattc tgcgaggacg tgacgaggga acacggccga 60 
acgtactttc tggccactcg gttgctgccc gagcctcgac gccgcgcagt tcacgctctc 120 
tacgcatttg ctcgcgtcgt cgacgacgtc gtggacgaac cctcgggt cc acatgaacga 180 
ggcacggtgc tcgccgacgt cgaacgtgca gccgtcaccg cactcgacaa ccccactgcg 240 
acaggtggct tcccgtcgac gattcccctc gacctgacac gcgtactccc tgccttcgcc 300 
gatgctgtga agacgttcga cattccgcgt gcatacttcg acgccttctt cgagtccatg 360 
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cggatggacg cccccgacac cgcgaagttt cgacccgtct acaacacgat ggacgagctt 420 

gccgagtaca tgtacggctc cgccgtcgtc atcggtttgc agatgctccc gattctcgga 480 

gtgagcgttc cgcagcagga agctgtagtg cccgcgtcga atctcggtga ggcgtttcag 540 
ctgaccaact tcatccgcga cgtcggtgaa gacctcgacc ggggacgtct gtatctcccg 
gcgggcgagt tcgccgcatt cggggtcgac atcgagatgc tcgagcacgg gcgcagaacc 

ggaacggtgg acgttcgggt caagcgcgcg ctggcacact tcattgcagt gacgcggggg 720 

cggtatcggt ccgccgaatc cggcatcccg atgctcgatc ggcgggtcca gccgtcgatc 780 

cgcacggctt tcgtgttgta cggagcaatt ctcgaccagg tcgagcgcgc cgacttccgg 840 

atactgcatc gacgagtgtc cgttcccgga cgcacgcgac ttcgagtcgc tgcgccgggt 900 

ctggtccggt cggcaaccta cgcggcgaaa aaccgcatga ggtga 945 

<210> 16 

<211> 314 

<212> PRT 

<213> Rhodococcus erythropolis 

<400> 16 

Met Asn Ala Leu Ser Ala Ser Tyr Glu Phe Cys Glu Asp Val Thr Arg 
1 5 10 15 

Glu His Gly Arg Thr Tyr Phe Leu Ala Thr Arg Leu Leu Pro Glu Pro 
20 25 30 

Arg Arg Arg Ala Val His Ala Leu Tyr Ala Phe Ala Arg Val Val Asp 
35 40 45 

Asp Val Val Asp Glu Pro Ser Gly Pro His Glu Arg Gly Thr Val Leu 
50 55 60 

Ala Asp Val Glu Arg Ala Ala Val Thr Ala Leu Asp Asn Pro Thr Ala 
65 70 75 80 

Thr Gly Gly Phe Pro Ser Thr lie Pro Leu Asp Leu Thr Arg Val Leu 
85 90 95 

Pro Ala Phe Ala Asp Ala Val Lys Thr Phe Asp lie Pro Arg Ala Tyr 
100 105 110 

Phe Asp Ala Phe Phe Glu Ser Net Arg Met Asp Ala Pro Asp Thr Ala 
115 120 " 125 

Lys Phe Arg Pro Val Tyr Asn Thr Met Asp Glu Leu Ala Glu Tyr Met 
130 135 140 

Tyr Gly Ser Ala Val Val lie Gly Leu Gin Met Leu Pro lie Leu Gly 
145 150 155 160 

Val Ser Val Pro" Gtn Gin Glu Ala Val Val Pro Ala Ser Asn Leu Gly 
165 170 175 

Glu Ala Phe Gin Leu Thr Asn Phe He Arg Asp Val Gly Glu Asp Leu 
180 185 190 

Asp Arg Gly Arg Leu Tyr Leu Pro Ala Gly Glu Phe Ala Ala Phe Gly 
195 200 205 

Val Asp He Glu Met Leu Glu His Gly Arg Arg Thr Gly Thr Val Asp 
210 215 220 
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Val Arg Val Lys Arg Ala Leu Ala His Phe He Ala Val Thr Arg Gly 
225 230 235 240 

Arg Tyr Arg Ser Aia Glu Ser Gly He Pro Met Leu Asp Arg Arg Val 
245 250 255 

Gin Pro Ser He Arg Thr Ala Phe Val Leu Tyr Gly Ala He Leu Asp 
260 265 270 

Gin Val Glu Arg Ala Asp Phe Arg He Leu His Arg Arg Val Ser Val 
275 280 285 

Pro Gly Arg Thr Arg Leu Arg Val Ala Ala Pro Gly Leu Val Arg Ser 
290 295 300 

Ala Thr Tyr Ala Ala Lys Asn Arg Met Arg 
305 310 

<210> 17 
<211> 1593 
<212> DNA 

<213> Rhodococcus erythropolis 
<40a> 17 

gtggcagacg tgcaccgcac tcgaaccgtc agctcgccga ccgatcgagt cgtgatcgtc 60 

ggcgcgggac ttgccggact gtctgcgggg ttgtatctgc gtggcgccgg ccgcgacgtc 120 

acgatcctcg agagcaacgg ctcggtcggc gggcgagtcg gtgtctacca gggcagtgac 180 

tacagcatcg acaacggcgc aacggtgctc acgatgcccg aactcgtcga agacgctctt 240 

gcggccgtcg gcgccgaccc cgactcgaca aaccccaaat tcgttgtgca caagctcgat , 300 

ccgacgtacc acgcgcgatt cgcagacggc acctctctcg atgttcacgc cgaccccgaa 360 

gacatggctg ccgaagtctc tcgtgtctgc gggccggaag aagcgcagcg ataccgtgcg 420 

ttgcggcgat ggctgaaccg catcttcgac gcggaattcg accgcttcat ggacgccgac 480 

ttcgattctc ccctcggact ggtcaattcg cgtgaagcag tcaaggatct gagccgactc 540 

gtcgcactgg gaggattcgg gaaactgggc gggcaggtgg atcgcaagat ccgcgaccct 600 

cgcctccggc ggatcttcac tttccaagcg ctgtatgcgg gagttgctcc gtctcgagcc 660 

ctcgcggtgt acggggcgat cgctcacatg gacacctcac tgggcgtcta ctttcccgag 720 

ggcgggatgc gcacgatcgc cgagtcgatg gccgacgctt tcaccgaggc cggcggaatt 780 

ctgcatctcg gccgcacggt cgaacgactc gaggtgagcg accgtcgcgt gcgtgccgta 840 

cacacatgcg acggtgagag cttcgactgt gacgtcgcag tcctcacccc cgacatggcc 900 

gtcacggact ccctcttgcg cccgcatacg cgattgcgcc cgcgaccggt gcgtacatcg 960 

ccgtccgcgg tcgtgattca cggcactgtt tcttcagccg tcgccgacgg atggcccgcg 1020 

cagcgacacc acatgatcga cttcggcgag gcgtggaagc gcaccttcgc cgagatcacg 1080 

gcacgccgcg gccgcgggca attgatgagt gatccgtcac tgctcgtcac ccgaccggcg 1140 

cagaccgacc cgagcctggc cttctcgcga gacggccgga tccgtgaacc gctgtcagtc 1200 

ctcgcgccgt gcccgaatct ggacagtgcg ccgctcgact gggcagttct cggcccggcc 1260 

tacgtgcgtg aaatcatcct cacgctgcaa gaacgtggct atacgggact ggtcgagggg 1320 

' ttcgatatcg atcacgtcga caccccgcag acctggctcg agaagggcat ggccgcgggt 1380 

agcccgttcg cggc'ggcaca caccttcacc cagacggggc cgttccgacg caagaacctc 1440 

gcccgcggct tcgacaacgt cgttctcgcc ggatcgggaa ccgttccggg ggtgggagta 1500 

ccgaccgttc tgctgtccgg ccggctcgcc gccgaacgta 'ttaccggtac acgcgagcga 1560 

gccagcgcgg tgggcactcg tgcgagcaac taa 1593 

<210> 18 
<211> 530 
<212> PRT 

<213> Rhodococcus erythropolis 
<400> 18 

Met Ala Asp Val His Arg Thr Arg Thr Val Ser Ser Pro Thr Asp Arg 
15 10 15 
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Val Val He Val Gly Ala Gly Leu Ala Gly Leu Ser Ala Gly Leu Tyr 
20 25 30 

Leu Arg Gly Ala Gly Arg Asp Val Thr He Leu Glu Ser Asn Gly Ser 
35 40 45 

Val Gly Gly Arg Val Gly Val Tyr Gin Gly Ser Asp Tyr Ser He Asp 
50 55 60 

Asn Gly Ala Thr Val Leu Thr Met Pro Glu Leu Val Glu Asp Ala Leu 
65 70 75 80 

Ala Ala Val Gly Ala Asp Pro Asp Ser Thr Asn Pro Lys Phe Val Val 
85 90 95 

His Lys Leu Asp Pro Thr Tyr His Ala Arg Phe Ala Asp Gly Thr Ser 
100 105 110 

Leu Asp Val His Ala Asp Pro Glu Asp Met Ala Ala Glu Val Ser Arg 
115 120 125 

Val Cys Gly Pro Glu Glu Ala Gin Arg Tyr Arg Ala Leu Arg Arg Trp 
130 135 140 

Leu Asn Arg He Phe Asp Ala Glu Phe Asp Arg Phe Met Asp Ala Asp 
145 150 155 160 

Phe Asp Ser Pro Leu Gly Leu Val Asn Ser Arg Glu Ala Val Lys Asp 
165 170 175 

Leu Ser Arg Leu Val Ala Leu Gly Gly Phe Gly Lys Leu Gly Gly Gin 
180 185 190 

Val Asp Arg Lys He Arg Asp Pro Arg Leu Arg Arg He Phe Thr Phe 
195 200 205 

Gin Ala Leu Tyr Ala Gly Val Ala Pro Ser Arg Ala Leu Ala Val Tyr 
210 215 220 

Gly Ala He Ala His Met Asp Thr Ser Leu Gly Val Tyr Phe Pro Glu 
225 230 235 240 

Gly Gly Met Arg Thr He Ala Glu Ser Met Ala Asp Ala Phe Thr Glu 
245 250 255 

-^Ala Gly Gly He Leu His Leu Gly Arg Thr Val Glu Arg Leu Glu Val 
26a — 265 270 

Ser Asp Arg Arg Val Arg Ala Val His Thr Cys Asp Gly Glu Ser Phe 
275 280 285 

Asp Cys Asp Val Ala Val Leu Thr Pro Asp Met Ala Val Thr Asp Ser 
290 295 300 

Leu Leu Arg Pro His Thr Arg Leu Arg Pro Arg Pro Val Arg Thr Ser 
305 310 315 320 

Pro Ser Ala Val Val He His Gly Thr Val Ser Ser Ala Val Ala Asp 
325 330 335 
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Gly Trp Pro Ala Gin Arg His His Met lie Asp Phe Gly Glu Ala Trp 
340 345 350 

Lys Arg Thr Phe Ala Glu He Thr Ala Arg Arg Gly Arg Gly Gin Leu 
355 360 365 

Met Ser Asp Pro Ser Leu Leu Val Thr Arg Pro Ala Gin Thr Asp Pro 
370 375 380 

Ser Leu Ala Phe Ser Arg Asp Gly Arg He Arg Glu Pro Leu Ser Val 
385 390 395 400 

Leu Ala Pro Cys Pro Asn Leu Asp Ser Ala Pro Leu Asp Trp Ala Val 
405 410 415 

Leu Gly Pro Ala Tyr Val Arg Glu He He Leu Thr Leu Gin Glu Arg 
420 425 430 

Gly Tyr Thr Gly Leu Val Glu Gly Phe Asp He Asp His Val Asp Thr 
435 440 445 

Pro Gin Thr Trp Leu Glu Lys Gly Met Ala Ala Gly Ser Pro Phe Ala 
450 455 460 

Ala Ala His Thr Phe Thr Gin Thr Gly Pro Phe Arg Arg Lys Asn Leu 
465 470 475 480 

Ala Arg Gly Phe Asp Asn Val Val Leu Ala Gly Ser Gly Thr Val Pro 
485 490 495 

Gly Val Gly Val Pro Thr Val Leu Leu Ser Gly Arg Leu Ala Ala Glu 
500 505 510 

Arg He Thr Gly Thr Arg Glu Arg Ala Ser Ala Val Gly Thr Arg Ala 
515 520 525 

Ser Asn 
530 

<210> 19 
<211> 1131 
<212> DNA 

<213> Rhodococcus erythropolis 

<400> 19 

--"atgagcacac tcgactcctc cgccgacgtg gtgatcgtgg gcggagggcc ggcggggcgg 60 

gcactcgcga cgcgttgtat cgcccggcaa ctcactgttg tcgttgtcga tccgcatcct 120 

catcgggtgt ggacgccgac gtactcggtg tgggcagacg agctgccgtc gtggctgccg 180 

gacgaggtga tcgcgagccg aatcgaacgc ccgagcgtgt ggaccagcgg gcagaaaacg 240 

cttgatcgca tctattgcgt attgaataca tctttactgc aatcatttct ctcccacaca 300 

tccataaagg tcagaggctt acgcgctcaa acactgtcca ccaccaccgt cgtgtgcgtg 360 

gacggatcgc agctgacggg atccgtcgtc gtcgacgccc gaggcaccga tctggcagtg 420 

acaaccgcgc agcagacggc cttcggaatg atcgtggacc gagctctggc cgatccgatt 480 

ctgggcggca gcgaggcctg gttcatggac tggcgaacag acaacggcac ctccgacgcc 540 

gacactccgt cgtttctcta cgcggtcccg ctcgacgacg agcgagtcct cctcgaggag 600 

acctgcctcg tcggccggcc ggcgttgggg ttgcgtgaac tcgaaacacg tctgcgcacc 660 

cgacttcaca atcggggctg cgaagtcccc gacgacgcgc cggtcgagcg agtccgtttt 720 

gcggtcgaag gcccgaggga ctcgtccccg gacggtgtcc tccggttcgg cgggcgaggc 780 

ggtctgatgc atccgggaac cggatacagc gttgcctcct cactcgccga ggccgacact 840 
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gtcgcgaaag caatcgccga cggtgaggat ccgaacgcgg cactctggcc tcgctcggcc 900 

aaggcggtat ccgctctccg ccgcgttggt ctgaacgcac ttctcaccct cgactcgggc 960 

gaagtcacca cattcttcga caagttcttc gatctaccgg tcgaggctca gcggtcatac 1020 

ctttccgatc ggcgggacgc ggccgcgacg gcgaaggtga tggcaacact gttccgatcg 1080 

tcaccgtggc acgtcagaaa gacgttgatg cgcgcgccgt ttttccggtg a 1131 

<210> 20 
<211> 376 
<212> PRT 

<213> Rhodococcus erythropolis 
<400> 20 

Met Ser Thr Leu Asp Ser Ser Ala Asp Val Val He Val Gly Gly Gly 
15 10 15 

Pro Ala Gly Arg Ala Leu Ala Thr Arg Cys He Ala Arg Gin Leu Thr 
20 25 30 

Val Val Val Val Asp Pro His Pro His Arg Val Trp Thr Pro Thr Tyr 
35 40 45 

Ser Val Trp Ala Asp Glu Leu Pro Ser Trp Leu Pro Asp Glu Val He 
50 55 60 

Ala Ser Arg lie Glu Arg Pro Ser Val Trp Thr Ser Gly Gin Lys Thr 
65 70 75 80 

Leu Asp Arg He Tyr Cys Val Leu Asn Thr Ser Leu Leu Gin Ser Phe 
85 90 95 

Leu Ser His Thr Ser He Lys Val Arg Gly Leu Arg Ala Gin Thr Leu 
100 105 110 

Ser Thr Thr Thr Val Val Cys Val Asp Gly Ser Gin Leu Thr Gly Ser 
115 120 125 

Val Val Val Asp Ala Arg Gly Thr Asp Leu Ala Val Thr Thr Ala Gin 
130 135 140 

Gin Thr Ala Phe Gly Met He Val Asp Arg Ala Leu Ala Asp Pro He 
145 150 155 160 

Leu Gly Gly Ser Glu Ala Trp Phe Met Asp Trp Arg Thr Asp Asn Gly 
165 170 175 

Thr Ser Asp Ala Asp Thr Pro Ser Phe Leu Tyr Ala Val Pro Leu Asp 
180 185 190 

Asp Glu Arg Val Leu Leu Glu Glu Thr Cys Leu Val Gly Arg Pro Ala 
195 200 205 

Leu Gly Leu Arg Glu Leu Glu Thr Arg Leu Arg Thr Arg Leu His Asn 
210 215 220 

Arg Gly Cys Glu Val Pro Asp Asp Ala Pro Val Glu Arg Val Arg Phe 
225 230 235 240 

Ala Val Glu Gly Pro Arg Asp Ser Ser Pro Asp Gly Val Leu Arg Phe 
245 250 255 
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Gly Gly Arg Gly Gly Leu Met His Pro Gly Thr Gly Tyr Ser Val Ala 
260 265 270 

Ser Ser Leu Ala Glu Ala Asp Thr Val Ala Lys Ala lie Ala Asp Gly 
275 280 285 

Glu Asp Pro Asn Ala Ala Leu Trp Pro Arg Ser Ala Lys Ala Val Ser 
290 295 300 

Ala Leu Arg Arg Val Gly Leu Asn Ala Leu Leu Thr Leu Asp Ser Gly 
305 310 315 320 

Glu Val Thr Thr Phe Phe Asp Lys Phe Phe Asp Leu Pro Val Glu Ala 
325 330 335 

Gin Arg Ser Tyr Leu Ser Asp Arg Arg Asp Ala Ala Ala Thr 7U.a Lys 
340 345 350 

Val Met Ala Thr Leu Phe Arg Ser Ser Pro Trp His Val Arg Lys Thr 
355 360 365 

Leu Met Arg Ala Pro Phe Phe Arg 
370 375 

<210> 21 
<211> 19 
<212> DNA 

<213> artificial sequence: primer 
<400> 21 

gagtttgatc ctggctcag 19 

<210> 22 
<211> 1(5 
<212> DNA 

<213> artificial sequence: primer 
<400> 22 

taccttgtta cgactt 16 

<210> 23 
<211> 17 
<212> DNA 

<213> artificial sequence: primer 
<400> 23 

'^tgccagcag ymgcggt 17 

<210> 24 
<211> 20 
<212> DNA 

<213> artificial sequence: primer 
<400> 24 

atttcgttga acggctcgcc 20 

<210> 25 
<211> 20 
<212> DNA 

<213> artificial sequence: primer 
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<400> 25 

cggcaatccg acctctacca 



20 



<210> 26 

<211> 20 

<212> DNA 

<213> artificial sequence: primer 

<400> 26 

tgagacgagc cgtcagcctt 20 

<210> 27 

<211> 29 

<212> DNA 

<213> artificial sequence: primer 



<210> 28 

<211> 30 

<212> DNA 

<213> artificial sequence: primer 

<400> 28 

catgccatgg cgcagagtgt cgacttcgtt 30 

<210> 29 

<211> 32 

<212> DNA 

<213> artificial sequence: primer 

<400> 29 

ttcatgccat ggactcgtcg aagacgctct tg 32 

<210> 30 

<211> 30 

<212> DNA 

<213> artificial sequence: primer 

<400> 30 

ttcatgccat ggtgacgagc agtgacggat 30 

<210> 31 

<211> 18 

.<212> DNA 

<213> artificial-sequence: primer 



<210> 32 

<211> 21 
<212> DNA 

<213> artificial sequence: primer 
<400> 32 

gccaatatgg acaacttctt c 21 



<400> 27 



catgccatgg cctcgaagcc ttcgtcctg 



29 



<400> 31 

agcggcatca gcaccttg 
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<210> 33 
<211> 24 
<212> DNA 

<213> artificial sequence: primer 
<400> 33 

atccgacctc actcgaactg ccag 

<210> 34 
<211> 24 
<212> DNA 

<213> artificial sequence: primer 
<400> 34 

ggtcggcgag ctgacggttc gagt 

<210> 35 
<211> 24 
<212> DNA 

<213> artificial sequence: primer 
<400> 35 

cggccacgaa gcgaagctac tgac 

<210> 36 
<211> 24 
<212> DNA 

<213> artificial sequence: primer 
<400> 36 

atcgtggatg aatggtcggt tacg 
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